Spectral Theory II

Jan. 20, 2020 $\newcommand{\bs}{\boldsymbol}$ $\newcommand{\argmin}[1]{\underset{\bs{#1}}{\text{arg min}}\,}$ $\newcommand{\argmax}[1]{\underset{\bs{#1}}{\text{arg max}}\,}$ $\newcommand{\tr}{^{\top}}$ $\newcommand{\norm}[1]{\left|\left|\,#1\,\right|\right|}$ $\newcommand{\given}{\,|\,}$ $\newcommand{\st}{\,\big|\,}$ $\newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$ $\newcommand{\P}[1]{\mathbb{P}\left(#1\right)}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\blue}[1]{{\color{blue} {#1}}}$ $\newcommand{\red}[1]{{\color{red} {#1}}}$ $\newcommand{\orange}[1]{{\color{orange} {#1}}}$ $\newcommand{\pfrac}[2]{\frac{\partial #1}{\partial #2}}$

It is the highlight of linear algebra that given a $n\times p$ matrix $X$, there exists a decomposition $X=UDV\tr$ where $D$ is a $p \times p$ diagonal matrix whose diagonal entries are the square roots of the descending eigenvalues $d_1^2\geq \cdots \geq d_p^2 \geq 0$ of both $X X\tr$ and $X\tr X$. These eigenvalues exist and are real because $X X\tr$ and $X\tr X$ are self-adjoint. The matrices $U_{n\times p}$ and $V_{p\times p}$ have orthogonal columns consisting of the eigenvectors of $X X\tr$ and $X\tr X$ respectively, so that $U\tr U=V\tr V = I_p$.

Given a unit vector $v\in \mathbb{R}^p$, the orthogonal projection onto the linear subspace spanned by $v$ is represented by the self-adjoint map $P=vv\tr$. It follows that if we project each row of $X$ onto $v$, the resulting matrix $Y := X P$ has rank 1 whose columns are all multiples of the derived variable $z := Xv$. We seek the principal component direction $v_1$ that maximizes the total variance of the derived variable $z$, given by $z\tr z = v\tr(X\tr X)v$. If $H$ acts on a real Euclidean space, $H$ is symmetric. We have the following important result.

Theorem 1   The quadratic form $q(x) = (x, Hx)$, where $H$ is self-adjoint and $\norm{x}=1$, is maximized (or minimized) by the unit eigenvector of $H$ corresponding to the largest (or smallest) eigenvalue.

By taking $H=X\tr X$, Theorem 1 shows that $v_1$ is the first column of $V$. Its derived variable, $z_1 = UDV\tr v_1 = d_1 u_1$ is proportional to the first column of $U$, and is often referred to as the first principal component of $X$. It turns out that Theorem 1 is equivalent to the fundamental theorem of Spectral Theory:

Theorem 1'   A self-adjoint map $H$ acting on a Euclidean space $X$ has a set of eigenvalues (called the spectrum of $H$) and genuine eigenvectors that form an orthonormal basis of $X$.

First we show that Theorem 1 and Theorem 1' are essentially the same before proving Theorem 1'.

Proof   Suppose Theorem 1' is true. That is, suppose $X$ has an orthonormal basis $f_1,...,f_n$ as eigenvectors of $H$ whose corresponding eigenvalues are $a_1,...,a_n$. Every vector $x\in X$ can be written as a linear combination of the basis vectors:

$$ x = \sum z_i f_i = \begin{bmatrix} f_1...f_n \end{bmatrix} \begin{bmatrix} z_1 \\ \vdots \\ z_n \end{bmatrix} := Mz. $$

Apply $H$ to the above expression, and using the fact that $Hf_i=a_if_i$, we have that

$$ Hx = \sum z_iHf_i = \sum z_i a_i f_i. $$

After introducing the new variable $z$, the quadratic form $q(x)=(x,Hx)$ can be diagonalized since

$$ \begin{aligned} (x, Hx) &=(Mz, HMz) = (z, M^*HMz) \\ & = \left(\sum z_if_i, \sum z_i a_i f_i\right) = \sum a_i z_i^2. \\ \end{aligned} \tag{1} $$

It follows that $D= M^*HM = \text{diag}(a_1,...,a_n)$ is the diagonal matrix in the following spectral decomposition:

$$ H = MDM^*. $$

Define the Rayleigh quotient as $R(x) = (x, Hx)\big/(x,x)$, which using the formulation (1) can be written in terms of the new variable $z$:

$$ R(x) = \frac{(x,Hx)}{(x,x)} = \frac{\sum a_i z_i^2}{\sum z_i^2}. $$

Without loss of generality, suppose the eigenvalues $a_1\leq \cdots \leq a_n$ are arranged in increasing order. We can easily see that by either taking $z_1 \neq 0$ or $z_n \neq 0$ while setting the rest $z_i = 0$, the maximum and the minimum of the Rayleigh quotient are $a_n$ and $a_1$ which are achieved at $f_n$ and $f_1$ respectively. That is,

$$ \begin{aligned} a_1 &= \min_x R(x) = R(f_1)\\ a_n &= \max_x R(x) = R(f_n). \end{aligned} $$

By the homogeneity property $R(kx) = R(x)$ for all $k$, optimizing the Rayleigh quotient is equivalent to optimizing the quadratic form $q(x)$ constrained to the unit sphere $\norm{x}=1$. That is,

$$ \min_x \frac{(x, Hx)}{(x,x)} = \min_{\norm{x}=1} q(x). $$

The existence of the minimum is guaranteed by the Heine-Borel theorem. We name this minimum $f$, and for any other vector $g$, and $t\in \mathbb{R}$, the real valued function $R(f+tg)$ can be expressed as

$$ \begin{aligned} R(f+tg) &= \frac{(f+tg, Hf + tHg)}{(f+tg, f+tg)}\\ &= \frac{(f,Hf) + 2t(g,Hf) + t^2(g,Hg)}{(f,f) + 2t(g,f) + t^2(g,g)} \\ &:= \frac{\alpha(t)}{\beta(t)} \end{aligned} \tag{2} $$

Since $f$ minimizes $R(x)$, the minimum of $R(f+tg)$ must occur at $t=0$, where $\beta(0)=1$. Let $r:=\alpha(0)/\beta(0)$ denote the minimum of $R(x)$ at $f$. Using calulus, we have

$$ \frac{d}{dt} R(f+tg)\, \bigg|_{t=0} = \frac{\alpha'(0) - \alpha(0)\beta'(0)}{\beta^2(0)} = \alpha'(0) - r\beta'(0)= 0. \tag{3} $$

Differentiating $\alpha$ and $\beta$ from (2) and placing them in (3), we have

$$ \begin{aligned} & 2(g,Hf) - 2r(g,f) = 0 \\ \implies & (g, Hf-rf) = 0 \\ \implies & Hf-rf = 0 \end{aligned} $$

This shows that $f$ is an eigenvector of $H$ corresponding to the eigenvalue $r$, which is the minimum of the Rayleigh quotient $R_H$. The rest of the proof follows easily by recursion. Let's rewrite $f$ and $r$ as $f_1$ and $r_1$, respectively. Define $X_1$ to be the orthogonal complement of $f_1$:

$$ X_1 := \big\{x\in X : (x, f_1)=0\big\}. $$

$H$ maps $X_1$ onto $X_1$ since for all $x \in X_1$,

$$ (Hx, f_1) = (x, Hf_1) = (x, r_1 f_1) = r_1(x, f_1) = 0. $$

Hence, the same minimization problem can be posed to $X_1$, yielding $f_2$ and $r_2$, where $r_2$ is the second smallest eigenvalue of $H$. This way we can build, by recursion, a full set of $n$ orthogonal eigenvectors with eigenvalues arranged in increasing order. $\square$

Spectral Resolution

By the Spectral Theorem for self-adjoint mappings, the whole space $X$ can be decomposed as the direct sum of null spaces $N_{(A-\lambda I)^d}$ containing the generalized eigenvectors for the eigenvalue $\lambda$ with index $d$. Suppose $\lambda_1,...,\lambda_k$ are the distinct eigenvalues of $H$. By Theorem 1', the eigenvectors are genuine, so the whole space $X$ can be decomposed into pairwise orthogonal eigenspaces:

$$ X = N_1 \oplus \cdots \oplus N_k, $$

where $N_i$ is the null space of the matrix $A-\lambda_i I$ and $N_i \perp N_j$ for $i\neq j$. This means for for any $x\in X$, it can be written as the sum of pairwise orthogonal vectors:

$$ x = x_1 + \cdots +x_k, \tag{4} $$

where $x_j = P_j x$ is the orthogonal projection of $x$ onto the null space $N_j$.

Applying $H$ on both sides, we see that

$$ Hx = \lambda_1 x_1 + \cdots + \lambda_k x_k. \tag{5} $$

Equations (4) and (5) can be rewritten in terms of linear mappings:

$$ \begin{aligned} I &= P_1 + \cdots + P_k \\ H &= \lambda_1 P_1 + \cdots + \lambda_k P_k \end{aligned} \tag{6} $$

Equation (6) gives a spectral resolution of $H$. By the properties of the orthogonal projection, the maps $P_i$'s must satisfy:

  • $P_j P_k=0$ for $j\neq k$

  • $P_j^2 = P_j$ for all $j$

  • $P_j^* = P_j.$

Using the first two properties it can be easily shown that

$$ p(H) = \sum_{j=1}^k p(\lambda_j)P_j, $$

for any polynomial $p$. This property can be generalized by replacing $p$ with any differentiable function $f$.

Remark   Using $P_j$ as a basis, we are reducing the dimension of a self-adjoint matrix $H$ to the number of eigenspaces of $H$.


Example 1   Let $ H = \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix}. $

Clearly, the eigenvalues are $\lambda_1=1$ and $\lambda_2=2$, and the eigenspaces are the Cartesian coordinate axis spanned by $(1,0)'$ and $(0,1)'$. The spectral resolution in this case are simple to find:

$$ \begin{aligned} H &= \lambda_1 P_1 + \lambda_2P_2 \\ &= 1\times \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} + 2\times \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}. \end{aligned} $$

Example 2   Let $ H = \begin{bmatrix} 3 & -1 \\ -1 & 3 \end{bmatrix}. $

The eigenvalues are $\lambda_1 = 2$ and $\lambda_2 = 4$, corresponding to eigenvectors $(1,1)'$ and $(-1,1)'$, two diagonal directions of the Cartesian plane. The projection matrix onto the diagonal $(1,1)'$ is given by

$$ P_1 = \begin{bmatrix}1 \\ 1 \end{bmatrix} \left(\begin{bmatrix}1 & 1 \end{bmatrix}\begin{bmatrix} 1 \\ 1 \end{bmatrix}\right)^{-1} \begin{bmatrix}1 & 1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}. $$

The projection matrix onto the off-diangonal $(-1,1)'$ is given similarly by

$$ P_2 = \begin{bmatrix}-1 \\ 1 \end{bmatrix} \left(\begin{bmatrix}-1 & 1 \end{bmatrix}\begin{bmatrix} -1 \\ 1 \end{bmatrix}\right)^{-1} \begin{bmatrix}-1 & 1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}. $$

Hence, the spectral resolution of $H$ is given by

$$ \begin{aligned} H &= \lambda_1 P_1 + \lambda_2P_2 \\ &= 2\times\frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} + 4\times\frac{1}{2} \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}. \end{aligned} $$

From this we can easily find any power of $H$:

$$ H^n = 2^n \times\frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} + 4^n \times\frac{1}{2} \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}. $$

Just for comparison, another way to raise $H$ to the power of $n$ is through diagonalization:

$$ H^n = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} 2^n & 0 \\ 0 & 4^n \end{bmatrix} \frac{1}{2} \begin{bmatrix} 1 & 1 \\ -1 & 1 \end{bmatrix}. $$

Example 3   (Fourier series)

In many cases we wish to estimate a function $f$ between the input $X$ and the output $Y$. If such function is cyclic in nature, it can be represented by a periodic function $u(t)$ defined on $[0,1]$. The vector $u=(u_1,...,u_n)$, such that $u_k = u(k/n) = u(t)$, is a discrete approximation of $u(t)$. In forecasting, it is often assumed that the existing pattern will continue in the future. Let $H = S$ be a cyclic shifting map defined as

$$ S(u_1,...,u_n) = (u_2,u_3,...,u_n,u_1). $$

Clearly the map $S$ is an isometry since any permutation preserves Euclidean norm. An isometric linear map $U$ is called an unitary map, which satisfies

$$ U^*U = UU^* = I. $$

It follows that $S$ is self-adjoint, so it possesses a set of orthonormal basis $e^{(j)}, j=1,...,n$, that spans $\mathbb{C}^n$. Furthermore the eigenvalues are complex numbers with absolute value 1. To see that, let $e$ be an eigenvector of $S$ with eigenvalue $\lambda$. Since $S$ is an isometry, we have $||Se|| = |\lambda|||f|| = ||f||$, so that $|\lambda|=1$.

Let $e = (e_1,...,e_n)$ be an eigenvector. From the equation $Se=\lambda e$, the components $e_i$ satisfy

$$ e_2 = \lambda e_1, \,\, e_3 = \lambda e_2,\,\, ...,\,\, e_n=\lambda e_{n-1},\,\, e_1 = \lambda e_n. $$

Setting $e_1=\lambda$, we see that the first $n-1$ equations becomes

$$ e_2 = \lambda^2, \,\, e_3 = \lambda^3, \,\, ..., \,\, e_n = \lambda^n. $$

The last equation becomes $\lambda = \lambda^{n+1}$, or $1 = \lambda^n$. It follows that the spectrum of $S$ are the $n$th roots of unity represented by

$$ \lambda_j = \text{exp}\left(i\frac{2\pi j}{n}\right). $$

And their corresponding eigenvectors are

$$ e^{(j)} = (\lambda_j, \lambda_j^2,...,\lambda_j^n). $$

Note that since $|\lambda_i|=1$, $||e^{(j)}|| = \sqrt{n}$ for all $j$. In general, we have

$$ e^{(j)}_k = \lambda_j^k = \text{exp}\left(i\frac{2\pi jk}{n} \right). $$

Remark   It is important to note that by Euler's identity, we have that

$$ \text{exp}\left(i\theta\right) = \cos\theta + i\sin\theta.$$

It can be seen that $\lambda_j$ are located on the unit circle on the complex plane. This shows its geometric significance.

Now with this orthogonal basis $e^{(1)},...,e^{(n)}$, the discrete approximation $u=(u_1,...,u_n)$ can be represented by

$$ u = \sum_{j=1}^n a_j e^{(j)}. \tag{7} $$

In particular, the $k$th component of $u$ is

$$ u_k = \sum_{j=1}^n a_j \text{exp}\left(i \frac{2\pi jk}{n} \right). \tag{8} $$

The coefficients $a_j$ can be obtained by projecting $u$ onto the basis $e^{(j)}$, which has norm $\sqrt{n}$:

$$ a_j = \frac{(u,e^{(j)})}{(e^{(j)}, e^{(j)})} = \frac{1}{n}(u, e^{(j)}) = \frac{1}{n}\sum_{k=1}^n u_k \text{exp}\left(-i\frac{2\pi jk}{n}\right). \tag{9} $$

(Note: Since we are in the complex vector space, the Hermitian inner product is defined by $(u, v) = \int u\bar{v}$, where $\bar{v}$ is the complex conjugate of $v$.)

If we treat $t = k/n \in [0,1]$ as time frequency and $n$ as period, the function $u_k = u(t)$ can be treated as a continuous function of period $n$. Its Fourier series representation is

$$ u(t) = \sum_{j=1}^n a_j \text{exp}\left(i\, 2\pi jt \right), $$

where the Fourier coefficients are given by

$$ \begin{aligned} a_j &= \int_0^1 u(t)\, \text{exp}\left(-i\, 2\pi jt\right) dt \\ &\simeq \frac{1}{n} \sum_{k=1}^n u(t)\, \text{exp}\left(-i\, 2\pi jt \right). \end{aligned} $$