Jan. 20, 2020 $\newcommand{\bs}{\boldsymbol}$ $\newcommand{\argmin}[1]{\underset{\bs{#1}}{\text{arg min}}\,}$ $\newcommand{\argmax}[1]{\underset{\bs{#1}}{\text{arg max}}\,}$ $\newcommand{\tr}{^{\top}}$ $\newcommand{\norm}[1]{\left|\left|\,#1\,\right|\right|}$ $\newcommand{\given}{\,|\,}$ $\newcommand{\st}{\,\big|\,}$ $\newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$ $\newcommand{\P}[1]{\mathbb{P}\left(#1\right)}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\blue}[1]{{\color{blue} {#1}}}$ $\newcommand{\red}[1]{{\color{red} {#1}}}$ $\newcommand{\orange}[1]{{\color{orange} {#1}}}$ $\newcommand{\pfrac}[2]{\frac{\partial #1}{\partial #2}}$

It is the highlight of linear algebra that given a $n\times p$ matrix $X$, there exists a decomposition $X=UDV\tr$ where $D$ is a $p \times p$ diagonal matrix whose diagonal entries are the square roots of the descending eigenvalues $d_1^2\geq \cdots \geq d_p^2 \geq 0$ of both $X X\tr$ and $X\tr X$. These eigenvalues exist and are real because $X X\tr$ and $X\tr X$ are self-adjoint. The matrices $U_{n\times p}$ and $V_{p\times p}$ have orthogonal columns consisting of the eigenvectors of $X X\tr$ and $X\tr X$ respectively, so that $U\tr U=V\tr V = I_p$.

Given a unit vector $v\in \mathbb{R}^p$, the orthogonal projection onto the linear subspace spanned by $v$ is represented by the self-adjoint map $P=vv\tr$. It follows that if we project each row of $X$ onto $v$, the resulting matrix $Y := X P$ has rank 1 whose columns are all multiples of the **derived variable** $z := Xv$. We seek the **principal component direction** $v_1$ that maximizes the total variance of the derived variable $z$, given by $z\tr z = v\tr(X\tr X)v$. If $H$ acts on a real Euclidean space, $H$ is **symmetric**. We have the following important result.

Theorem 1The quadratic form $q(x) = (x, Hx)$, where $H$ is self-adjoint and $\norm{x}=1$, is maximized (or minimized) by the unit eigenvector of $H$ corresponding to the largest (or smallest) eigenvalue.

By taking $H=X\tr X$, Theorem 1 shows that $v_1$ is the first column of $V$. Its derived variable, $z_1 = UDV\tr v_1 = d_1 u_1$ is proportional to the first column of $U$, and is often referred to as the **first principal component** of $X$. It turns out that **Theorem 1** is equivalent to the fundamental theorem of Spectral Theory:

Theorem 1'A self-adjoint map $H$ acting on a Euclidean space $X$ has a set of eigenvalues (called thespectrumof $H$) and genuine eigenvectors that form an orthonormal basis of $X$.

First we show that Theorem 1 and Theorem 1' are essentially the same before proving Theorem 1'.

*Proof* Suppose **Theorem 1'** is true. That is, suppose $X$ has an orthonormal basis $f_1,...,f_n$ as eigenvectors of $H$ whose corresponding eigenvalues are $a_1,...,a_n$. Every vector $x\in X$ can be written as a linear combination of the basis vectors:

Apply $H$ to the above expression, and using the fact that $Hf_i=a_if_i$, we have that

$$ Hx = \sum z_iHf_i = \sum z_i a_i f_i. $$After introducing the new variable $z$, the quadratic form $q(x)=(x,Hx)$ can be diagonalized since

$$ \begin{aligned} (x, Hx) &=(Mz, HMz) = (z, M^*HMz) \\ & = \left(\sum z_if_i, \sum z_i a_i f_i\right) = \sum a_i z_i^2. \\ \end{aligned} \tag{1} $$It follows that $D= M^*HM = \text{diag}(a_1,...,a_n)$ is the diagonal matrix in the following spectral decomposition:

$$ H = MDM^*. $$Define the **Rayleigh quotient** as $R(x) = (x, Hx)\big/(x,x)$, which using the formulation (1) can be written in terms of the new variable $z$:

Without loss of generality, suppose the eigenvalues $a_1\leq \cdots \leq a_n$ are arranged in increasing order. We can easily see that by either taking $z_1 \neq 0$ or $z_n \neq 0$ while setting the rest $z_i = 0$, the maximum and the minimum of the Rayleigh quotient are $a_n$ and $a_1$ which are achieved at $f_n$ and $f_1$ respectively. That is,

$$ \begin{aligned} a_1 &= \min_x R(x) = R(f_1)\\ a_n &= \max_x R(x) = R(f_n). \end{aligned} $$By the homogeneity property $R(kx) = R(x)$ for all $k$, optimizing the Rayleigh quotient is equivalent to optimizing the quadratic form $q(x)$ constrained to the unit sphere $\norm{x}=1$. That is,

$$ \min_x \frac{(x, Hx)}{(x,x)} = \min_{\norm{x}=1} q(x). $$The existence of the minimum is guaranteed by the **Heine-Borel theorem**. We name this minimum $f$, and for any other vector $g$, and $t\in \mathbb{R}$, the real valued function $R(f+tg)$ can be expressed as

Since $f$ minimizes $R(x)$, the minimum of $R(f+tg)$ must occur at $t=0$, where $\beta(0)=1$. Let $r:=\alpha(0)/\beta(0)$ denote the minimum of $R(x)$ at $f$. Using calulus, we have

$$ \frac{d}{dt} R(f+tg)\, \bigg|_{t=0} = \frac{\alpha'(0) - \alpha(0)\beta'(0)}{\beta^2(0)} = \alpha'(0) - r\beta'(0)= 0. \tag{3} $$Differentiating $\alpha$ and $\beta$ from (2) and placing them in (3), we have

$$ \begin{aligned} & 2(g,Hf) - 2r(g,f) = 0 \\ \implies & (g, Hf-rf) = 0 \\ \implies & Hf-rf = 0 \end{aligned} $$This shows that $f$ is an eigenvector of $H$ corresponding to the eigenvalue $r$, which is the minimum of the Rayleigh quotient $R_H$. The rest of the proof follows easily by recursion. Let's rewrite $f$ and $r$ as $f_1$ and $r_1$, respectively. Define $X_1$ to be the orthogonal complement of $f_1$:

$$ X_1 := \big\{x\in X : (x, f_1)=0\big\}. $$$H$ maps $X_1$ onto $X_1$ since for all $x \in X_1$,

$$ (Hx, f_1) = (x, Hf_1) = (x, r_1 f_1) = r_1(x, f_1) = 0. $$Hence, the same minimization problem can be posed to $X_1$, yielding $f_2$ and $r_2$, where $r_2$ is the second smallest eigenvalue of $H$. This way we can build, by recursion, a full set of $n$ orthogonal eigenvectors with eigenvalues arranged in increasing order. $\square$

By the Spectral Theorem for self-adjoint mappings, the whole space $X$ can be decomposed as the direct sum of null spaces $N_{(A-\lambda I)^d}$ containing the generalized eigenvectors for the eigenvalue $\lambda$ with index $d$. Suppose $\lambda_1,...,\lambda_k$ are the distinct eigenvalues of $H$. By **Theorem 1'**, the eigenvectors are genuine, so the whole space $X$ can be decomposed into pairwise orthogonal eigenspaces:

where $N_i$ is the null space of the matrix $A-\lambda_i I$ and $N_i \perp N_j$ for $i\neq j$. This means for for any $x\in X$, it can be written as the sum of pairwise orthogonal vectors:

$$ x = x_1 + \cdots +x_k, \tag{4} $$where $x_j = P_j x$ is the orthogonal projection of $x$ onto the null space $N_j$.

Applying $H$ on both sides, we see that

$$ Hx = \lambda_1 x_1 + \cdots + \lambda_k x_k. \tag{5} $$Equations (4) and (5) can be rewritten in terms of linear mappings:

$$ \begin{aligned} I &= P_1 + \cdots + P_k \\ H &= \lambda_1 P_1 + \cdots + \lambda_k P_k \end{aligned} \tag{6} $$Equation (6) gives a **spectral resolution** of $H$. By the properties of the orthogonal projection, the maps $P_i$'s must satisfy：

$P_j P_k=0$ for $j\neq k$

$P_j^2 = P_j$ for all $j$

$P_j^* = P_j.$

Using the first two properties it can be easily shown that

$$ p(H) = \sum_{j=1}^k p(\lambda_j)P_j, $$for any polynomial $p$. This property can be generalized by replacing $p$ with any differentiable function $f$.

RemarkUsing $P_j$ as a basis, we are reducing the dimension of a self-adjoint matrix $H$ to the number of eigenspaces of $H$.

**Example 1** Let
$
H = \begin{bmatrix}
1 & 0 \\
0 & 2
\end{bmatrix}.
$

Clearly, the eigenvalues are $\lambda_1=1$ and $\lambda_2=2$, and the eigenspaces are the Cartesian coordinate axis spanned by $(1,0)'$ and $(0,1)'$. The spectral resolution in this case are simple to find:

$$ \begin{aligned} H &= \lambda_1 P_1 + \lambda_2P_2 \\ &= 1\times \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} + 2\times \begin{bmatrix} 0 & 0 \\ 0 & 1 \end{bmatrix}. \end{aligned} $$**Example 2** Let
$
H = \begin{bmatrix}
3 & -1 \\
-1 & 3
\end{bmatrix}.
$

The eigenvalues are $\lambda_1 = 2$ and $\lambda_2 = 4$, corresponding to eigenvectors $(1,1)'$ and $(-1,1)'$, two diagonal directions of the Cartesian plane. The projection matrix onto the diagonal $(1,1)'$ is given by

$$ P_1 = \begin{bmatrix}1 \\ 1 \end{bmatrix} \left(\begin{bmatrix}1 & 1 \end{bmatrix}\begin{bmatrix} 1 \\ 1 \end{bmatrix}\right)^{-1} \begin{bmatrix}1 & 1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}. $$The projection matrix onto the off-diangonal $(-1,1)'$ is given similarly by

$$ P_2 = \begin{bmatrix}-1 \\ 1 \end{bmatrix} \left(\begin{bmatrix}-1 & 1 \end{bmatrix}\begin{bmatrix} -1 \\ 1 \end{bmatrix}\right)^{-1} \begin{bmatrix}-1 & 1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}. $$Hence, the spectral resolution of $H$ is given by

$$ \begin{aligned} H &= \lambda_1 P_1 + \lambda_2P_2 \\ &= 2\times\frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} + 4\times\frac{1}{2} \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}. \end{aligned} $$From this we can easily find any power of $H$:

$$ H^n = 2^n \times\frac{1}{2} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} + 4^n \times\frac{1}{2} \begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}. $$Just for comparison, another way to raise $H$ to the power of $n$ is through diagonalization:

$$ H^n = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} 2^n & 0 \\ 0 & 4^n \end{bmatrix} \frac{1}{2} \begin{bmatrix} 1 & 1 \\ -1 & 1 \end{bmatrix}. $$**Example 3** (Fourier series)

In many cases we wish to estimate a function $f$ between the input $X$ and the output $Y$. If such function is cyclic in nature, it can be represented by a periodic function $u(t)$ defined on $[0,1]$. The vector $u=(u_1,...,u_n)$, such that $u_k = u(k/n) = u(t)$, is a discrete approximation of $u(t)$. In forecasting, it is often assumed that the existing pattern will continue in the future. Let $H = S$ be a cyclic shifting map defined as

$$ S(u_1,...,u_n) = (u_2,u_3,...,u_n,u_1). $$Clearly the map $S$ is an isometry since any permutation preserves Euclidean norm. An isometric linear map $U$ is called an **unitary map**, which satisfies

It follows that $S$ is self-adjoint, so it possesses a set of orthonormal basis $e^{(j)}, j=1,...,n$, that spans $\mathbb{C}^n$. Furthermore the eigenvalues are complex numbers with absolute value 1. To see that, let $e$ be an eigenvector of $S$ with eigenvalue $\lambda$. Since $S$ is an isometry, we have $||Se|| = |\lambda|||f|| = ||f||$, so that $|\lambda|=1$.

Let $e = (e_1,...,e_n)$ be an eigenvector. From the equation $Se=\lambda e$, the components $e_i$ satisfy

$$ e_2 = \lambda e_1, \,\, e_3 = \lambda e_2,\,\, ...,\,\, e_n=\lambda e_{n-1},\,\, e_1 = \lambda e_n. $$Setting $e_1=\lambda$, we see that the first $n-1$ equations becomes

$$ e_2 = \lambda^2, \,\, e_3 = \lambda^3, \,\, ..., \,\, e_n = \lambda^n. $$The last equation becomes $\lambda = \lambda^{n+1}$, or $1 = \lambda^n$. It follows that the spectrum of $S$ are the $n$th roots of unity represented by

$$ \lambda_j = \text{exp}\left(i\frac{2\pi j}{n}\right). $$And their corresponding eigenvectors are

$$ e^{(j)} = (\lambda_j, \lambda_j^2,...,\lambda_j^n). $$Note that since $|\lambda_i|=1$, $||e^{(j)}|| = \sqrt{n}$ for all $j$. In general, we have

$$ e^{(j)}_k = \lambda_j^k = \text{exp}\left(i\frac{2\pi jk}{n} \right). $$$$ \text{exp}\left(i\theta\right) = \cos\theta + i\sin\theta.$$

RemarkIt is important to note that by Euler's identity, we have thatIt can be seen that $\lambda_j$ are located on the unit circle on the complex plane. This shows its geometric significance.

Now with this orthogonal basis $e^{(1)},...,e^{(n)}$, the discrete approximation $u=(u_1,...,u_n)$ can be represented by

$$ u = \sum_{j=1}^n a_j e^{(j)}. \tag{7} $$In particular, the $k$th component of $u$ is

$$ u_k = \sum_{j=1}^n a_j \text{exp}\left(i \frac{2\pi jk}{n} \right). \tag{8} $$The coefficients $a_j$ can be obtained by projecting $u$ onto the basis $e^{(j)}$, which has norm $\sqrt{n}$:

$$ a_j = \frac{(u,e^{(j)})}{(e^{(j)}, e^{(j)})} = \frac{1}{n}(u, e^{(j)}) = \frac{1}{n}\sum_{k=1}^n u_k \text{exp}\left(-i\frac{2\pi jk}{n}\right). \tag{9} $$(Note: Since we are in the complex vector space, the **Hermitian inner product** is defined by $(u, v) = \int u\bar{v}$, where $\bar{v}$ is the complex conjugate of $v$.)

If we treat $t = k/n \in [0,1]$ as **time frequency** and $n$ as **period**, the function $u_k = u(t)$ can be treated as a continuous function of period $n$. Its Fourier series representation is

where the Fourier coefficients are given by

$$ \begin{aligned} a_j &= \int_0^1 u(t)\, \text{exp}\left(-i\, 2\pi jt\right) dt \\ &\simeq \frac{1}{n} \sum_{k=1}^n u(t)\, \text{exp}\left(-i\, 2\pi jt \right). \end{aligned} $$