Modes of Convergence

Dec. 31, 2018 $\newcommand{\bs}{\boldsymbol}$ $\newcommand{\argmin}[1]{\underset{\bs{#1}}{\text{arg min}}\,}$ $\newcommand{\argmax}[1]{\underset{\bs{#1}}{\text{arg max}}\,}$ $\newcommand{\tr}{^{\top}}$ $\newcommand{\norm}[1]{\left|\left|\,#1\,\right|\right|}$ $\newcommand{\given}{\,|\,}$ $\newcommand{\st}{\,\big|\,}$ $\newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$ $\newcommand{\P}[1]{\mathbb{P}\left(#1\right)}$ $\newcommand{\blue}[1]{{\color{blue} {#1}}}$ $\newcommand{\red}[1]{{\color{red} {#1}}}$ $\newcommand{\orange}[1]{{\color{orange} {#1}}}$ $\newcommand{\pfrac}[2]{\frac{\partial #1}{\partial #2}}$

This is a brief overview of different modes of convergence of random variables. Recently I have a sudden interest in the theory of convergence, a vastly beautiful subject worth mastering.

Let $X_1,X_2,...$ be a sequence of random variables, and $X$ be some other random variable. Let $F_n(t) = \P{X_n \leq t}$ denote the CDF of $X_n$ and $F$ the CDF of $X$.

Definition 1   $X_n$ converges almost surely to $X$, denoted as $X_n \overset{a.s.}{\to} X$, if for every $\epsilon>0$,

$$ \P{\lim_{n\to\infty} \left| X_n-X \right| < \epsilon} = 1 \tag{1} $$

The phrase "almost surely" is referring to the property that the set of elements that do not satisfy the property must have measure 0. Expanding (1) completely, we have

$$ \P{\left\{\omega \in \Omega\, \big|\, \lim_{n\to\infty} \left|X_n(\omega) - X(\omega)\right|>\epsilon\right\}}= 0. $$

More specifically, $X_n$ converges almost surely to a constant if

$$ \P{\lim_{n\to\infty}X_n = c}=1. $$

Definition 2   $X_n$ converges to $X$ in probability, denoted as $X_n \overset{P}{\to} X$ if, for every $\epsilon>0$,

$$\P{\left|X_n-X\right| > \epsilon} \to 0. \tag{2}$$

The key differenrece between (2) and (1) is the placement of the limit. Convergence in probability can be expressed this way:

$$ \lim_{n\to\infty}\P{\left\{\omega\in\Omega\, \big| \, \left|X_n(\omega)-X(\omega)\right|>\epsilon \right\}} = 0. $$

Here, as $n\to\infty$, the set we are measuring can "move" in the sample space, with only the requirement that its measure is converging to 0. For almost sure convergence, however, the limiting set must be fixed with measure 0.

Theorem 1   Convergence in probability does not imply almost sure convergence.

Proof   Here is the classic counterexample. We denote by $X$ a uniform random variable in $[0,1]$. For any $s\in [0,1]$, define $X(s)=s$. We also define a sequence of random variables $X_1(s),X_2(s),...$ by

$$ \begin{array} & X_1 = s + I(0<s<1), & X_2 = s + I(0<s<1/2), \\ X_3 = s + I(1/2<s<1), & X_4 = s + I(0<s<1/3), \\ X_5 = s+I(1/3<s<2/3), & X_6 = s + I(2/3<s<1) \end{array} $$

Immediately we see that $X_n \overset{P}{\to}X$ since for any $\epsilon >0$, the probability $\P{\left|X_n - X\right|>\epsilon}$ is simply the lengths of the shrinking intervals, which go to 0 as $n\to \infty$. However, $X_n$ does NOT converge to $X$ almost surely since

$$ \P{\left\{s\in\Omega\, \big|\, \lim_{n\to\infty}X_n(s)\neq X(s)\right\}}=1. $$

This is because for all $s\in [0,1]$, the indicator variable will be activated infinitely often as $n\to \infty$. $\blacksquare$

Definition 3   $X_n$ converges to $X$ in distribution denoted as $X_n \rightsquigarrow X$ if

$$ \lim_{n\to\infty} F_n(t) = F(t). \tag{3} $$

for all $t$ at which $F$ is continuous.

In other words, $X_n$ converges ot $X$ in distribution if the CDF $F_n$ converges weakly to $F$. In particular, $X_n$ converges to a constant $c$ in distribution if for all $t\neq c$,

$$ \lim_{n\to \infty} F_n(t) = \delta_c(t) = \begin{cases} 1 & \text{ if } t\geq c \\ 0 & \text{ if } t < c \end{cases} $$

Theorem 2   $X_n \overset{P}{\to} X$ implies $X_n \rightsquigarrow X$.

Proof   Assume convergence in probability, we need to show weak convergence in the CDF. So, for any $\epsilon >0$, and let $x$ be a continuity point of $F$, then

$$ \begin{aligned} F_n(x) &= \P{X_n \leq x}\\ &= \P{X_n \leq x, X \leq x+\epsilon} + \P{X_n \leq x, X > x+\epsilon} \\ &\leq \P{X \leq x+\epsilon} + \P{\left|X_n - X\right|>\epsilon} \\ &=F(x+\epsilon) + \P{\left|X_n-X\right|>\epsilon} \end{aligned} $$

Also we have

$$ \begin{aligned} F(x-\epsilon) &= \P{X\leq x-\epsilon} \\ &=\P{X\leq x-\epsilon, X\leq x} + \P{X\leq x-\epsilon, X >x}\\ &\leq F_n(x) + \P{\left|X_n-X\right|>\epsilon} \end{aligned} $$

Combining the two inequalities above, we obtain

$$ F(x-\epsilon)-\P{|X_n-X|>\epsilon} \leq F_n(x) \leq F(x+\epsilon) + \P{|X_n-X|>\epsilon}. $$

Taking the limit as $n\to \infty$, the term $\P{|X_n-X|>\epsilon}$ goes to 0, so we are left with

$$ F(x-\epsilon) \leq \liminf_{n\to\infty} F_n(x) \leq \limsup_{n\to\infty} F_n(x) \leq F(x+\epsilon). $$

Finally, taking the limit as $\epsilon \to 0$, and since $F$ is continuous at $x$, we conclude that the limit of $F_n$ exists and is equal to $F$. $\blacksquare$

Theorem 3   The converse of Theorem 1.2 is not true in general. However, if $X_n \rightsquigarrow c$ for some constant $c$, then $X_n \overset{P}{\to} c$.

Proof   First we show a classic counterexample. Let $X\sim N(0,1)$ be a standard normal random variable, and define $X_n = -X$ for all $n$. By symmetry, we have that $X_n \sim N(0,1)$ for all $n$. Therefore, $X_n \rightsquigarrow X$ trivially. However,

$$ \P{\left|X_n-X\right| > \epsilon} = \P{\left|X\right|>\epsilon/2} \neq 0, $$

so we do not have convergence in probability.

Let's prove the second statement. We need to show that for all $\epsilon>0$, $\P{\left|X_n-c\right|>\epsilon} \to 0$ as $n\to\infty$. So let $\epsilon >0$, we have

$$ \begin{aligned} \P{\left|X_n-c\right|>\epsilon} &\leq \P{X_n\leq c-\epsilon} + \P{X_n>c+\epsilon} \\ &=F_n(c-\epsilon) + 1- F_n(c+\epsilon) \\ &\to F(c-\epsilon) + 1- F(c+\epsilon) \\ &= 0 + 1 - 1 \quad [\text{since } F(x) = \delta_0(x)]\\ &= 0 \end{aligned} $$

This concludes the proof. $\blacksquare$

Theorem 4   $X_n \overset{a.s.}{\to} X$ implies $X_n \overset{P}{\to} X$.

Proof   We define a sequence of sets $\{A_n\}$ by

$$ A_n = \bigcup_{m=n}^{\infty}\left\{\omega\in\Omega \st \left|X_m(\omega)-X(\omega)\right|>\epsilon \right\}. $$

From the definition, we see that this is a decreasing sequence of sets, i.e.,

$$ A_1 \supseteq A_2 \supseteq A_3 \cdots. $$

This sequence tends to a limiting set as its countable intersection:

$$ A_{\infty} := \lim_{n\to\infty} A_n = \bigcap_{n=1}^{\infty} A_n. $$

Since $\{A_n\}$ is decreasing, by the monotonicity of measure, the sequence of their corresponding probabilities $\{\P{A_n}\}$ is also a decreasing sequence. Furthermore, using the continuity property of probability, the sequence tends to the limit $\P{A_{\infty}}$. That is,

$$ \P{A_1} \geq \P{A_2} \geq \P{A_3} \cdots \to \P{A_{\infty}}. $$

Now we use the assumption that $X_n$ converges to $X$ almost surely. From (1), for any $\epsilon>0$, we have that $\P{B}=0$ where $B$ is a set defined as

$$ B := \left\{\omega \in \Omega \st \lim_{n\to\infty} \left|X_n(\omega) - X(\omega)\right|>\epsilon \right\}. $$

For any $\omega \in B^c$, we have that $\lim_{n\to\infty} X_n(\omega) = X(\omega)$. This means that for any $\epsilon>0$, there exists a positive integer $N$, such that

$$ n \geq N \implies \left|X_n(\omega) - X(\omega)\right|<\epsilon. $$

Hence for any $n\geq N$, $\omega$ does not belong to the set $A_n$, and hence does not belong to the set $A_{\infty}$. Therefore, we can conclude that $A_{\infty} \in B$. Since $\P{B}=0$, we have that $\P{A_{\infty}}=0$ as well.

Finally, since $\P{\left|X_n-X\right|>\epsilon}\leq \P{A_n}$ for all $n$,

$$ \lim_{n\to\infty} \P{\left|X_n-X\right|>\epsilon}\leq \lim_{n\to\infty} \P{A_n} = 0, $$

which means, by definition, that $X_n \rightsquigarrow X$. $\blacksquare$

Theorem 5   If $X_n \overset{P}{\to} X$ and $Y_n \overset{P}{\to}Y$, then

$$ \begin{aligned} X_n + Y_n &\overset{P}{\to} X+Y \\ X_nY_n &\overset{P}{\to}XY. \end{aligned} $$

Proof   For any $\epsilon>0$, using the Bonferroni inequality, we have

$$ \begin{aligned} \P{\left|X_n+Y_n-X-Y\right|>\epsilon} &\leq \P{\left|X_n-X\right| + \left|Y_n-Y\right|>\epsilon} \\ &\leq \P{\left|X_n-X\right|> \epsilon/2} + \P{\left|Y_n-Y\right|>\epsilon/2} \\ &\to 0 + 0 = 0. \end{aligned} $$

This proves the first part. Similarly, for the second part, let $\epsilon >0$ be given.

$$ \begin{aligned} \P{\left|X_nY_n - XY\right|>\epsilon}&= \P{\left|X_nY_n - X_nY + X_nY - XY\right|>\epsilon} \\ &=\P{\left|X_n(Y_n-Y) + Y(X_n-X)\right|>\epsilon} \\ &\leq \P{\left|X_n(Y_n-Y)\right|>\epsilon/2} + \P{\left|Y(X_n-X)\right|>\epsilon/2}. \end{aligned} $$

We need to show the above term goes to 0. To proceed, take any $r > 0$. We have

$$ \begin{aligned} \left\{\left|X_n(Y_n-Y)\right|>\epsilon/2\right\}&\subseteq \left\{\left|X_n\right|>r+1\right\}\cup \left\{\left|Y_n-Y\right|>\epsilon/(2(r+1))\right\} \\ &= \left\{\left|X_n-X+X\right|>r+1\right\}\cup \left\{\left|Y_n-Y\right|>\epsilon/(2(r+1))\right\}\\ &\subseteq \left\{\left|X_n-X\right|>1\right\} \cup \left\{\left|X\right|>r\right\}\cup \left\{\left|Y_n-Y\right|>\epsilon/(2(r+1))\right\}. \end{aligned} $$

Since this holds for all $r>0$, letting $r\to\infty$, and using the convergence of $X_n$ and $Y_n$ in probability, we conclude that

$$ \lim_{n\to\infty} \P{\left|X_n(Y_n-Y)\right|>\epsilon} \leq \lim_{r\to\infty} \P{\left|X\right|>r} =0. $$

Similarly, for any $r>0$,

$$ \begin{aligned} \P{\left\{\left|Y(X_n-X)\right|>\epsilon/2\right\}}&\leq \P{\left\{\left|Y\right|>r+1\right\}} + \P{\left\{\left|X_n-X\right|>\epsilon/(2(r+1))\right\}} \\ &\to 0. \quad [\text{as } n,r\to\infty] \end{aligned} $$

This concludes the proof. $\blacksquare$

Theorem 6   In general, $X_n \rightsquigarrow X$ and $Y_n\rightsquigarrow Y$ does not imply $X_n+Y_n \rightsquigarrow X+Y$.

Proof   As a counterexample, suppose $X$ and $Y$ are independent $N(0,1)$ random variables. Let $X_n = X = -Y_n$ for all $n\in \mathbb{N}$. Trivially we have that $X_n$ and $Y_n$ converge to $X$ and $Y$ in probability, respectively. However, $X_n+Y_n \equiv 0$ which is different from $X+Y \sim N(0,2)$. $\blacksquare$

In order to proceed, we need the following theorem, which we give without proof.

Theorem 7 (Portmanteau Lemma)   Convergence in distribution $X_n \rightsquigarrow X$ is equivalent to any of the following statements:

  1. $\E{f(X_n)} \to \E{f(X)}$ for all bounded, Lipschitz functions $f$.
  2. $\limsup \P{X_n\in C} \leq \P{X\in C}$ for all closed sets $C$.

Proof   Reserved for later post. $\square$

Theorem 8 (Continuous Mapping Theorem)   Let $X_n, X$ be random variables in the metric space $(\mathcal{X}, d)$. Let $g: \mathcal{X}\to \mathcal{Y}$ be a continuous function, then

  1. If $X_n \overset{a.s.}{\to} X$, then $g(X_n) \overset{a.s.}{\to} g(X)$
  2. If $X_n \overset{P}{\to} X$, then $g(X_n) \overset{P}{\to} g(X)$
  3. If $X_n \rightsquigarrow X$, then $g(X_n) \rightsquigarrow g(X)$

Proof   The first statement is easiest to prove. By continuity of $g$, for any $\omega \in \Omega$, we have

$$ X_n(\omega) \to X(\omega) \implies g(X_n(\omega)) \to g(X(\omega)), $$

where convergence is with respect to the metric $d$. This implies the following set inclusion:

$$ \left\{\omega \st \lim_{n\to\infty}X_n(\omega) = X(\omega) \right\} \subseteq \left\{\omega \st \lim_{n\to\infty} g(X_n(\omega)) = g(X(\omega))\right\} $$

By almost sure convergence, the left side has probability 1, implying that the right side must have probability 1 as well. This concludes the proof.

For convergence in probability, we fix any $\epsilon>0$, and define a sequence $\delta_m >0$ converging from above to 0. (Denoted as $\delta_m \searrow 0$.) This brings about a sequence of sets:

$$ A_m = \left\{x \in \mathcal{X} \st \exists\, y : \left|\left|x-y\right|\right|<\delta_m, \norm{g(x)-g(y)}>\epsilon \right\}, $$

where $\norm{\cdot}$ is the norm induced by the metric. From the definition we see that $\{A_m\}$ is a sequence of decreasing sets, whose limit is given by

$$ A_{\infty} = \left\{x \in \mathcal{X}\st \lim_{y\to x} g(y) \neq g(x)\right\} = \emptyset. $$

Now for any fixed $m$, again from Bonferroni inequality, we have that

$$ \begin{aligned} \P{\norm{g(X_n)-g(X)}>\epsilon}&\leq \P{\norm{X_n-X}\geq \delta_m} + \P{X\in A_m}. \\ &\to \P{X\in A_m} \quad\,\, [\text{as } n\to \infty]\\ &\to \P{\emptyset} = 0 \quad \quad[\text{as } m\to \infty]. \end{aligned} $$

For convergence in distribution, we use the fact that if $g$ is continuous, for any closed set $B\subset\mathcal{Y}$, the preimage $g^{-1}(B)$ is also a closed set. Hence,

$$ \begin{aligned} \limsup_{n\to\infty} \P{g(X_n) \in B} & = \limsup_{n\to \infty} \P{X_n \in g^{-1}(B)} \\ &\leq \P{X \in g^{-1}(B)} \quad [\text{Portmanteau}] \\ &= \P{g(X) \in B}. \end{aligned} $$

Hence by $g(X_n)\rightsquigarrow g(X)$ by Theorem 7. $\blacksquare$

Theorem 9 (Slutsky's Theorem)   Let $X_n, Y_n, X, Y$ be random variables, and $c$ a constant. Then,

  1. $X_n \overset{a.s.}{\to} X$ and $(X_n-Y_n) \overset{a.s.}{\to} 0$ implies $Y_n \overset{a.s.}{\to} X$.
  2. $X_n \overset{p}{\to} X$ and $(X_n-Y_n) \overset{p}{\to} 0$ implies $Y_n \overset{p}{\to} X$.
  3. $X_n \rightsquigarrow X$ and $(X_n-Y_n) \overset{p}{\to} 0$ implies $Y_n \rightsquigarrow X$.

Proof   To prove the first statement, we use the triangle inequality to have the following set inclusions:

$$ \begin{aligned} \left\{\lim_{n\to\infty} \left|Y_n-X\right| > \epsilon \right\} &\subseteq \left\{\lim_{n\to\infty} \left|Y_n-X_n\right| + \left|X_n-X\right| >\epsilon \right\} \\ &\subseteq \left\{\lim_{n\to\infty} \left|Y_n-X_n\right|>\epsilon/2\right\} \cup \left\{\lim_{n\to\infty} \left|X_n-X\right|>\epsilon/2\right\} \end{aligned} $$

This implies that

$$ \begin{aligned} \P{\lim_{n\to\infty} \left|Y_n-X\right|>\epsilon} &\leq \P{\lim_{n\to\infty} \left|Y_n-X_n\right|>\epsilon/2} + \P{\lim_{n\to\infty}\left|X_n-X\right|>\epsilon/2} \\ &= 0 + 0 = 0, \end{aligned} $$

which proves the first part.

The proof for the second part is almost exactly like that for the first part, except we remove the limits in the intermediate steps, and take the limit in the final step to prove the desired result.

To prove the third statement, we appeal to the Portmanteau Lemma by noting that showing $X_n \rightsquigarrow X$ is equivalent to showing $\E{g(X_n)} \to \E{g(X)}$ for all bounded Lipschitz function $g$. Since $g$ is Lipschitz, we have that for all $\epsilon>0$, there exists a $\delta>0$, such that $\left|x-y\right|<\delta$ implies that $\left|g(x)-g(y)\right|<\epsilon$. Also since $g$ is bounded, we can assume $g\leq M$ for some constant $M$. Hence,

$$ \begin{aligned} \left|\E{g(Y_n)} - \E{g(X)}\right| &\leq \left|\E{g(Y_n)} - \E{g(X_n)}\right| + \left|\E{g(X_n)} - \E{g(X)} \right|\\ &\leq \left|\E{g(Y_n)} - \E{g(X_n)}\right|I(\left|X_n-Y_n\right|\leq\delta) \\ & \quad + \left|\E{g(X_n)} - \E{g(X)} \right|I(\left|X_n-Y_n\right|>\delta)\\ & \quad + \left|\E{g(X_n)}-\E{g(X)}\right| \\ & \leq \epsilon + 2M \P{\left|X_n-Y_n\right|>\delta} + \left|\E{g(X_n)}-\E{g(X)}\right| \\ & \to \epsilon + 0 + 0 = \epsilon. \end{aligned} $$

So the result follows by the Portmanteau Lemma. $\blacksquare$

Theorem 10   Convergence in each components under the right conditions implies joint convergence:

$$ \begin{aligned} X_n \overset{a.s.}{\to}X,\,\, Y_n \overset{a.s.}{\to}Y &\implies (X_n, Y_n) \overset{a.s.}{\to}(X,Y) \\ X_n \overset{p}{\to} X,\,\, Y_n \overset{p}{\to} Y &\implies (X_n, Y_n) \overset{p}{\to} (X, Y) \\ X_n \rightsquigarrow X,\,\, Y_n \rightsquigarrow c &\implies (X_n, Y_n) \rightsquigarrow (X, c) \end{aligned}$$

Proof   The first statement follows from

$$ \begin{aligned} & \quad\, \P{\lim_{n\to\infty}\norm{(X_n, Y_n)-(X,Y)}>\epsilon} \\ &\leq \P{\lim_{n\to\infty} \norm{X_n-X}>\epsilon\sqrt{2}} + \P{\lim_{n\to\infty}\norm{Y_n-Y}>\epsilon/\sqrt{2}} \\ &=0+0=0. \end{aligned} $$

The proof for the second statement is almost the same, except we remove the limits in the intermediate steps, and take the limit in the final step.

For the third statement, we have that $Y_n \overset{p}{\to} c$ from Theorem 1.3. This implies that

$$ \P{\norm{(X_n, Y_n)-(X_n, c)}>\epsilon} =\P{\norm{Y_n-c}>\epsilon} \to 0. $$

Let $g$ be any bounded Lipschitz function. Since $X_n \rightsquigarrow X$, by the Portmanteau Lemma, we have that

$$ \E{g(X_n,c)} \to \E{g(X,c)}. $$

This implies that $(X_n, c) \rightsquigarrow (X, c)$. Finally, by Slutsky's theorem, we conclude that $(X_n, Y_n) \rightsquigarrow (X, c)$. $\blacksquare$

Corollary 1   If $g$ is a continuous function then,

$$ \begin{aligned} X_n \overset{a.s.}{\to}X,\,\, Y_n \overset{a.s.}{\to}Y &\implies g(X_n, Y_n) \overset{a.s.}{\to}g(X,Y) \\ X_n \overset{p}{\to} X,\,\, Y_n \overset{p}{\to} Y &\implies g(X_n, Y_n) \overset{p}{\to} g(X, Y) \\ X_n \rightsquigarrow X,\,\, Y_n \rightsquigarrow c &\implies g(X_n, Y_n) \rightsquigarrow g(X, c) \end{aligned}$$

Proof   This follows directly from Theorem 1.10 and the continuous mapping theorem. $\blacksquare$