Applications of Convergence - Part 1

Jan. 13, 2019 $\newcommand{\bs}{\boldsymbol}$ $\newcommand{\argmin}[1]{\underset{\bs{#1}}{\text{arg min}}\,}$ $\newcommand{\argmax}[1]{\underset{\bs{#1}}{\text{arg max}}\,}$ $\newcommand{\tr}{^{\top}}$ $\newcommand{\norm}[1]{\left|\left|\,#1\,\right|\right|}$ $\newcommand{\given}{\,|\,}$ $\newcommand{\st}{\,\big|\,}$ $\newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$ $\newcommand{\P}[1]{\mathbb{P}\left(#1\right)}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\blue}[1]{{\color{blue} {#1}}}$ $\newcommand{\red}[1]{{\color{red} {#1}}}$ $\newcommand{\orange}[1]{{\color{orange} {#1}}}$ $\newcommand{\pfrac}[2]{\frac{\partial #1}{\partial #2}}$

Now we are ready to state some of the most important results in statistics. We begin with the law of large numbers.

The Law of Large Numbers

Intuitively, the law of large numbers says that if we repeat the chance experiment enough times, the average result will be close to the mean of its distribution. For instance, if we flip a coin enough times, the proportion of heads will get close to 1/2. But what does "get close" really mean? If we let $X_n$ denote the number of heads in $n$ coin tosses, then as $n$ becomes large, the proportion $X_n/n$ will be close to $1/2$, but the difference between heads and tails $D = X_n-(n-X_n) = 2X_n-n$ does not converge to 0, contratry to our intuition at first! This quantity $D$ is known as a symmetric random walk. Note that after each flip, $D$ changes by 1 in either direction with probability of 1/2.

In [108]:
set.seed(9)
rolls <- rbinom(1000, 1, 1/2)
D <- rep(0, 1000)

for (i in 1:1000){
    D[i] <- 2*sum(rolls[1:i]) - i
}

options(repr.plot.width=5, repr.plot.height=4)
plot(1:1000, D, type='l', xlab='Rolls', ylab='D', main='Random Walk')
abline(h=0, col='red')

As a matter of fact, by the central limit theorem, which we will discuss shortly, the fluctuation $D$ grows proportionally with $\sqrt{n}$ as $n$ increases. This result does not contradict the law of large numbers, however, since to calculate the proportion, we divide by $n$, which dominates $\sqrt{n}$ as $n$ gets large, so asymptotically this difference becomes negligible in calculating proportions.

Formally speaking, let $T_1,...,T_n$ denote the results of the $n$ tosses where $T_i=0$ if the $i$th toss is tails and $T_i=1$ if heads. The difference can be represented by

$$ \begin{aligned} D &= 2(T_1+\cdots + T_n) - n \\ &= 2(T_1+\cdots + T_n - n/2) \end{aligned} $$

By the central limit theorem, the quantity $(T_1+\cdots + T_n - n/2)$ has approximately a normal distribution $N(0, \sigma\sqrt{n})$ where $\sigma=1/2$ is the standard deviation of the $T_i$'s. This explains the increase in the fluctuation. On the other hand, the proportion $P_n$ of heads in $n$ coin tosses is represented as

$$ P_n = \frac{1}{n}(T_1+\cdots +T_n), $$

which by the central limit theorem has the distribution $N(0.5, \frac{1}{2\sqrt{n}})$. This means the proportion becomes "concentrated" more and more toward 0.5. Let's formalize the notion lf the LLN below.

Theorem 1 (The Weak Law of Large Numbers)   Let $X_1,...,X_n$ be iid random variables with finite mean $\mu$ and standard deviation $\sigma$. Then

$$\overline{X}_n = \frac{X_1+\cdots+X_n}{n} \overset{p}{\to}\mu $$

Proof   By Markov's inequality,

$$ \begin{aligned} \P{\abs{\overline{X}_n-\mu}>\epsilon} &=\P{\abs{\overline{X}_n-\mu}^2 >\epsilon^2} \\ &\leq \frac{\text{Var}\left(\overline{X}_n\right)}{\epsilon^2} & [\text{Markov}]\\ &= \frac{\sigma^2}{n\epsilon^2} & [\sigma <\infty]. \end{aligned} $$

This quantity tends to 0 as $n \to \infty$. $\blacksquare$

Before preceding we need the following important results.

Lemma 1 (Chernoff Bound)   Let $X$ be a random variable, then

$$ \P{X>\epsilon} \leq \inf_{t\geq 0} e^{-t\epsilon}M_X(t). $$ where $M_X(t)=\E{e^{tX}}$ denote the moment-generating function of $X$.

Proof   For any $t>0$, we have

$$ \begin{aligned} \P{X>\epsilon} &= \P{e^X>e^{\epsilon}} \\ &\P{e^{tX}>e^{t\epsilon}} \\ &\leq e^{-t\epsilon}\E{e^{tX}} & [\text{Markov}] \\ &= e^{-t\epsilon}M_X(t). \end{aligned} $$

Since this is true for every $t\geq 0$, the Lemma is proved. $\blacksquare$

Lemma 2 (Borel-Cantelli)   Let $E_1,E_2,...$ be a sequence of events. Then,

$$\sum_{n=1}^{\infty} \P{E_n}<\infty \implies \P{\limsup_{n\to\infty} E_n} = 0.$$

where the limsup is defined as

$$\limsup_{n\to\infty} E_n = \bigcap_{n=1}^{\infty}\bigcup_{m\geq n} E_m,$$ which represents the set of outcomes that occur infinitely often (i.o.) in the infinite sequence of events $\{E_n\}$.

Proof   Since the series $\sum \P{E_n}$ converges, we must have the tail sum go to zero, i.e.,

$$ \lim_{n\to\infty} \sum_{m\geq n}\P{E_m} = 0. $$

This implies that

$$ \inf_{n\geq 1} \sum_{m\geq n}\P{E_m} = 0. $$

Hence,

$$ \begin{aligned} \P{\limsup_{n\to\infty} E_N} &= \P{\bigcap_{n=1}^{\infty}\bigcup_{m\geq n}E_m} \\ &= \inf_{n\geq 1}\P{\bigcup_{m\geq n}E_m} & [\text{decreasing sets}] \\ &\leq \inf_{n\geq 1}\sum_{m\geq n}\P{E_m} & [\text{Bonferroni}]\\ &=0. \end{aligned} $$

This concludes the proof. $\blacksquare$

Theorem 2 (Strong Law of Large Numbers)   Under the same condition as Theorem 1, we have that $\overline{X}_n \overset{a.s.}{\to}\mu$.

Proof   Fix $\epsilon>0$. We have

$$ \P{\abs{\overline{X}_n-\mu}\geq\epsilon} =\P{\overline{X}_n \geq \mu+\epsilon} + \P{\overline{X}_n\leq\mu-\epsilon} $$

Assume that the moment-generating function $M_X(t)$ of $X_i$ exists. The moment generating function of $X_1+\cdots+X_n$ is $\left[M_X(t)\right]^n$. By Chernoff bound, we have

$$ \begin{aligned} \P{\overline{X}_n \geq \mu + \epsilon} &\leq \inf_{t\geq0} e^{-n(\mu+\epsilon)t} \left[M_X(t)\right]^n \\ &= \inf_{t\geq 0}\left[e^{-(\mu+\epsilon)t}M_X(t)\right]^n \\ &\leq \lambda^n, \end{aligned} $$

where $\lambda = \inf_{t\geq 0} e^{-(\mu+\epsilon)t}M_X(t)$. Let $G(t) = e^{-(\mu+\epsilon)t}M_X(t)$. Since $M_X(0)=1$ and $M'_X(0)=\mu$, we have $G(0)=1$, and

$$ G'(0) = -(\mu+\epsilon)+\mu = -\epsilon <0. $$

Since $G(t)$ is nonnegative and strictly decreasing, we conclude that $0\leq \lambda <1$. Similarly, we can show that

$$ \P{\overline{X}_n \leq \mu-\epsilon} \leq \eta^n, $$

where $\eta = \inf_{t\geq 0} e^{-(\mu-\epsilon)t} M_X(t)$. Thus, we have

$$ \begin{aligned} \sum_{i=1}^{\infty}\P{\abs{\overline{X}_n-\mu}\geq\epsilon} &\leq \sum_{n=1}^{\infty}(\lambda^n+\eta^n) \\ &=\frac{\lambda}{1-\lambda}+\frac{\eta}{1-\eta} \\ &< \infty. \end{aligned} $$

Now invoking the Borel-Cantelli lemma, for any positive integer $k$, we have that $P(A_k)=0$, where

$$ \begin{aligned} A_{k} &= \limsup_{n\to\infty} \left\{\omega\in\Omega \st \abs{\overline{X}_n(\omega)-\mu}\geq \frac{1}{k} \right\} \\ &= \left\{\omega\in\Omega \st \abs{\overline{X}_n(\omega)-\mu}\geq \frac{1}{k} \text{ infinitely often}\right\}. \end{aligned} $$

The complement of $A_{k}$, denoted by $A^c_{k}$, is the set

$$ A^c_{k} =\left\{\omega\in\Omega \st \abs{\overline{X}_n(\omega)-\mu}<\frac{1}{k} \text{ for some } n \right\}, $$

where $P(A^c_{k})=1$. As $k$ increases, we have a decreasing sequence of sets $\{A^c_k\}_{k=1}^{\infty}$. Using the continuity property of probability measure,

$$ \begin{aligned} \P{\lim_{n\to\infty} \overline{X}_n=\mu} &= \P{\limsup_{k\to\infty} A^c_k} \\ &= \lim_{k\to\infty} \P{A^c_{k}} & [\text{continuity}]\\ &=1. \end{aligned} $$

This proves almost sure convergence. $\blacksquare$

So what does the strong version say that weak version can't? The strong law of large numbers shows that no matter how small $\epsilon$ is, eventually the sample mean will always be within a distance $\epsilon$ from $\mu$ and stay there. This conclusion cannot be drawn from the weak law of large numbers, which only states that for a specified large $n$, the sample mean is likely to be near $\mu$. This is sharpened by the central limit theorem, which states that for a fixed $n$,

$$ \P{\abs{\overline{X}_n - \mu} \geq \epsilon} \approx 2\left(1-\Phi(\epsilon\sqrt{n}/\sigma)\right), $$

where $\sigma$ is the standard deviation of each sample. With the strong law of large numbers, we can further claim that given any $\epsilon>0$, there exists an $N$ large enough so that whenever $n>N$, the error can be no more than $\epsilon$ almost surely.

The Renewal-Reward Theorem

Now we show an interesting application of the law of large numbers. Consider a machine that ages and needs to be replaced once in a while. The machine with a continuously distributed lifetime needs to be replaced either upon failure or its lifetime has reached $T>0$. If the machine lasts until time $T$, there is a fixed cost $c_1>0$ for replacement. On the other hand, if the machine fails before time $T$, the cost for a failure replacment is $c_2>0$ which is larger than $c_1$. A reward $R_k$ is earned for the $k$th machine with lifetime $L_k\leq T$. This is the classic example of a regenerative stochastic process since each time we replace an old machine, the process restarts, and becomes independent of its past. The problem can be approached using a Monte Carlo simulation.

We generate iid random variables $L_1,L_2,...$ by taking a minimum between $T=1$ and the value randomly generated from an exponential random variable with rate parameter $\lambda=1$. Furthermore, we assume that the reward $R_k$ for the $k$th machine earned during its lifetime $L_k$ is distributed as $\text{Poisson}(L_k)$. So we have the following heiarchical process:

$$ \begin{aligned} R_k \given L_k &\sim \text{Poisson}(L_k) \\ L_k &= \min(T, Y) \\ Y &\sim \text{Exponential}(1) \end{aligned} \tag{1} $$

First we generate $n$ cycle lengths $L_1,...,L_n$ coming from the exponential distribution with rate 1. Then we replace those $L_k > T$ where $T$ is set to be the lifetime for a planned replacement. At the same time, we also generate a list of rewards based on (1).

In [205]:
n <- 1000 # number of cycles
c1 <- 0.1 # cost for planned replacement
c2 <- 0.2 # cost for failure replacement
lambda <- 1 # rate of the Poisson process
T <- 1 # maximum time before replacement

L <- rep(0,n) # storage for cycle lengths
R <- rep(0,n) # storage for rewards

set.seed(39) # for reproducible result

for (i in 1:n) {
    L[i] <- min(T,rexp(1,lambda)) # generate L_k
    Rk <- rpois(1, L[i])
    if (L[i] == T) {
        R[i] <- Rk - c1 # planned replacement
    } else {
        R[i] <- Rk - c2 # failure replacement
    }
}

Let $R(t)$ denote the cumulative reward earned up to time $t$. We are interested in the ratio $R(t)/t$. Intuitively we expect it to converge to something, but to what? Let's illustrate this graphically by plotting a vector of $t$ versus the ratio $R(t)/t$. In R this can be easily accomplished using the cumsum function.

In [206]:
plot(cumsum(R)/cumsum(L), type='l', xlab='t', ylab='R(t)/t')
abline(h=mean(R)/mean(L), col='red')

We see that the ratio $R(t)/t$ converges to the ratio between the average of $R_k$ and the average of $L_k$! Let's formalize this result.

Definition 1   A stochastic process $\{S(t), t\in \Gamma\}$ is regenerative if there exists time points $0= T_0<T_1<T_2<\cdots$ such that the processes $\{S(T_k+t), t\in \Gamma\}$ are independent and identically distributed for all $k$. The time interval between two successive regeneration $L_{k+1}=T_{k+1}-T_k$ is called a cycle. Let the random variable $R_k$ denote the reward earned in cycle $k$. For any $t>0$, we let $R(t)$ denote the cumulative reward earned up to time $t$. The iid assumptions imply that both $L_k$'s and $R_n$'s are iid random variables.

Now for a regenerative continuous-time stochastic process $\{S(t), t\in \Gamma\}$, assume that $L_k$'s and $R_k$'s are iid with finite expected value. We have the following result.

Theorem 3 (Renewal-reward)   With probability 1,

$$ \lim_{t\to\infty}\frac{R(t)}{t} = \frac{\E{R_1}}{\E{L_1}}.$$

Proof   For any $t>0$, let $N(t)$ denote the number of cycles completed up to time $t$. First note that

$$ \sum_{k=1}^{N(t)} L_k \leq t < \sum_{k=1}^{N(t)+1} L_k. $$

Dividing all times by $N(t)$, we obtain

$$ \frac{1}{N(t)}\sum_{k=1}^{N(t)}L_k \leq \frac{t}{N(t)} < \frac{1}{N(t)}\sum_{k=1}^{N(t)+1}L_k. $$

Taking the limit as $t\to\infty$, by the strong law of large numbers, both the leftmost and rightmost terms converge almost surely to $\E{L_1}$. This shows that with probability 1,

$$ \lim_{t\to\infty} \frac{N(t)}{t} = \frac{1}{\E{L_1}}. \tag{2} $$

Next we consider the case when all the rewards $R_k$ are nonnegative. By definition,

$$ \sum_{k=1}^{N(t)}R_k \leq R(t) \leq \sum_{k=1}^{N(t)+1}R_k. $$

This implies that

$$ \frac{1}{N(t)}\sum_{k=1}^{N(t)}R_k \times \frac{N(t)}{t} \leq \frac{R(t)}{t} \leq \frac{1}{N(t)}\sum_{k=1}^{N(t)+1} R_k \times \frac{N(t)}{t}. $$

Taking the limit as $t\to\infty$ and using (2), we arrive at the desired result. Finally if $R_k$ is negative for some $k$, we can take $R_k = R_k^+ - R_k^-$ where $R_k^+ =\max(R_k,0)$ and $R_k^-=-\min(R_k,0)$, and repeat the proof for both parts. $\blacksquare$