March. 14, 2019 $\newcommand{\bs}{\boldsymbol}$ $\newcommand{\argmin}[1]{\underset{\bs{#1}}{\text{arg min}}\,}$ $\newcommand{\argmax}[1]{\underset{\bs{#1}}{\text{arg max}}\,}$ $\newcommand{\tr}{^{\top}}$ $\newcommand{\norm}[1]{\left|\left|\,#1\,\right|\right|}$ $\newcommand{\given}{\,|\,}$ $\newcommand{\st}{\,\big|\,}$ $\newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$ $\newcommand{\P}[1]{\mathbb{P}\left(#1\right)}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\blue}[1]{{\color{blue} {#1}}}$ $\newcommand{\red}[1]{{\color{red} {#1}}}$ $\newcommand{\orange}[1]{{\color{orange} {#1}}}$ $\newcommand{\pfrac}[2]{\frac{\partial #1}{\partial #2}}$

A linear transformation $T:U\to V$, where $U$ and $V$ have dimensions $n$ and $m$, can be represented by a matrix $A$ if we specify a basis for $U$ and $V$. Suppose $\mathcal{A}$ and $\mathcal{B}$ are basis for $U$ and $V$ respectively, then the linear map $T$ can be written as

$$ \begin{aligned} A : \mathbb{F}^n &\to \mathbb{F}^m \\ [u]_{\mathcal{A}} & \mapsto A [u]_{\mathcal{A}} := [T]_{\mathcal{A}}^{\mathcal{B}}[u]_{\mathcal{A}}. \end{aligned} $$

Here $[u]_{\mathcal{A}}$ is the vector of the coefficients in the expansion of $u$ with respect to the basis $\mathcal{A}$, where

$$ [u]_{\mathcal{A}} = (l_1(u),...,l_n(u))', $$

where $l_1,...,l_n$ is the dual basis for $U$ with respect to the basis $\mathcal{A}$.

Let $\mathcal{A} = \{a_1,...,a_n\}$ and let $\mathcal{B} = \{b_1,...,b_m\}$. From the rules of matrix multiplication, we can write

$$ A = [T]_{\mathcal{A}}^{\mathcal{B}} = \big[[T(a_1)]_{\mathcal{B}} \st [T(a_2)]_{\mathcal{B}} \st \cdots \st [T(a_n)]_{\mathcal{B}} \big]. \tag{1} $$

To gain some intuition behind equation (1), we use the fact that a linear map $T$ is completely determined by its values at the basis elements. For example, the first basis element $a_1$ is represented by

$$ [a_1]_{\mathcal{A}} = (1,0,...,0)'. $$

So multiplying the matrix $A$ by $[a_1]_{\mathcal{A}}$ yields the first column of $A$ as the value of $T(a_1)$ written in terms of the basis $\mathcal{B}$. Using similar arguments for $a_2,...,a_n$, we can completely determine the matrix $A$.

Let $A := [T]_{\mathcal{I}}^{\mathcal{I}}$ be the matrix representation of a linear tranformation $T: \mathbb{F}^n \to \mathbb{F}^n$ with respect to the standard basis $\mathcal{I} = \{e_1,...,e_n\}$ of $\mathbb{F}^n$. Here we are working directly in $\mathbb{F}^n$, so any vector $u=(u_1,...,u_n)'$, by default, is represented in terms of the standard basis; that is, $u=u_1e_1 + \cdots u_n e_n$. Also suppose there are $n$ linearly independent eigenvectors $\mathcal{B} = \{v_1,v_2,...,v_n\}$ such that $Av_i = \lambda_i v_i$. Using a change of basis, we have

$$ \begin{aligned} A = [T]_{\mathcal{I}}^{\mathcal{I}} = [I]_{\mathcal{B}}^{\mathcal{I}} [T]_{\mathcal{B}}^{\mathcal{B}} [I]_{\mathcal{I}}^{\mathcal{B}} := S B S^{-1}, \end{aligned} \tag{2} $$

where $I$ represents the identity map, and the matrix $S = [I]_{\mathcal{B}}^{\mathcal{I}}$ is called the `change of basis`

matrix. Using (1), we have

$$ S = \big[[v_1]_{\mathcal{I}} \st [v_2]_{\mathcal{I}} \st \cdots \st [v_n]_{\mathcal{I}} \big] = \big[v_1 \st v_2 \st \cdots \st v_n\big]. $$

RemarkAs a side note, we see that any invertible matrix $A$ becomes the identity map after performing to the domain space a change of basis to the columns of $A$. This change of basis matrix $[I]_{\mathcal{I}}^{\mathcal{B}}$ is precisely $A^{-1}$.

Similarly, the matrix $B$ can be written as

$$ \begin{aligned} B &= [T]_{\mathcal{B}}^{\mathcal{B}} \\ &= \big[ [Av_1]_{\mathcal{B}} \st [Av_2]_{\mathcal{B}} \st \cdots \st [Av_n]_{\mathcal{B}}\big] \\ &= \big[ [\lambda_1 v_1]_{\mathcal{B}} \st [\lambda_2 v_2]_{\mathcal{B}} \st \cdots \st [\lambda_n v_n]_{\mathcal{B}} \big] \\ &= \big[ \lambda_1 e_1 \st \lambda_2 e_2 \st \cdots \st \lambda_n v_n \big] \\ &= \text{diag}(\lambda_1,...,\lambda_n). \end{aligned} $$

In this case the matrices $A$ and $B$ are `similar`

, and $A$ is `diagonalizable`

. So the question is: When do we have $n$ linearly independent eigenvectors?

TheoremEigenvectors that correspond to different eigenvalues are linearly independent.

*Proof* Let $\lambda_1,...,\lambda_k$ be distinct eigenvalues of $A$ corresponding to eigenvectors $v_1,...,v_k$. For the sake of contradiction, suppose the eivenvectors are linearly dependent. That is, we can find a smallest nonzero integer $m$ such that
$$
\sum_{i=1}^m c_iv_i = 0, \quad c_m \neq 0. \tag{3}
$$

Multiplying this by $A$, we get

$$ \sum_{i=1}^m c_i Av_i = \sum_{i=1}^m c_i\lambda_iv_i = 0. \tag{4} $$

Multiplying (3) by $\lambda_m$ and subtracting (4), we have

$$ \lambda_m \sum_{i=1}^m c_i v_i- \sum_{i=1}^m c_i\lambda_i v_i = \sum_{j=1}^{m-1}(\lambda_m-\lambda_j)c_jv_j = 0. $$

Since $v_m$ is nonzero, equation (3) implies that there must be at least one other non-zero coefficient among $c_1,...,c_{m-1}$. Also since the eivenvalues are distinct, the differences $\lambda_m-\lambda_j, \, j=1,...,m-1$ are all non-zero. Hence, there exists a nonzero coefficient among $(\lambda_m-\lambda_j)c_j,\, j=1,...,m-1$, which contradicts the minimality of $m$. $\blacksquare$

A direct corollary from to this theorem is the following important result.

CorollaryIf $A$ is a $n\times n$ matrix that has $n$ distinct eigenvalues, then their corresponding eigenvectors form a basis for $\mathbb{F}^n$ and $A$ is diagonalizable.

However not all matrices have $n$ distinct eigenvalues. In the case when the characteristic polynomial has repeating roots (it will always have $n$ roots in an algebraically close field $\mathbb{F}$), we need a more generalized decomposition.

A matrix $A \in \text{Mat}_{n\times n}(\mathbb{F})$ has an eigenvalue $\lambda$ if and only if $\lambda$ is a root of the `characteristic polynomial`

:

$$ P_A(s) = \det(A-s I). $$

For any eigenvalue $\lambda$, the determinant is 0, so the null space must be non-trivial. We call this null space the `eigenspace`

of $\lambda$:

$$ N_1(\lambda) := N_{A-\lambda I}. $$

The `algebraic multiplicity`

of an eigenvalue $\lambda$ is the largest power $m$ such that $(s-\lambda)^m$ is a factor of the characteristic polynomial $P_A(s)$. The `geometric multiplicity`

of $\lambda$ is the dimension of the null space $N_1(\lambda)$. The following is a sidenote.

TheoremThe geometric multiplicity is bounded above by the algebraic multiplicity.

*Proof* For a matrix $A$, suppose the geometric multiplicity of $\lambda$ is $k$. Then we have $k$ linearly independent $\lambda$-eigenvectors $v_1,...,v_k$ as a basis for a subspace of $\mathbb{R}^n$. Extend this to a basis for $\mathbb{R}^n$:

$$ \mathcal{B} = \{v_1,...,v_k, w_1,...,w_{n-k}\}. $$

Using a change of basis we can rewrite

$$ \begin{aligned} B &= [A]_{\mathcal{B}}^{\mathcal{B}} \\ &= \left[ [Av_1]_{\mathcal{B}} \st \cdots \st [Av_k]_{\mathcal{B}} \st \cdots \right] \\ &= \left[ \lambda[v_1]_{\mathcal{B}} \st \cdots \st \lambda[v_k]_{\mathcal{B}} \st \cdots \right] \\ &= \begin{bmatrix} \lambda I_k & B \\ 0 & C \end{bmatrix} \end{aligned} $$

Hence the characteristic polynomial of $B$ must contain a term $(s-\lambda)^k$. It remains to show that $A$ and $B$ have the same characteristic polynomial. Since $A$ and $B$ are related by a change of basis, there exists invertible matrix $S=[I]_{\mathcal{I}}^{\mathcal{B}}$ such that $A = SBS^{-1}$. We have

$$ \begin{aligned} P_A(s) &= \det(A - sI) \\ &= \det(SBS^{-1}-sI) \\ &= \det(SBS^{-1} - SsIS^{-1}) \\ &= \det(S(B-sI)S^{-1}) \\ &= \det(S)\det(B-sI)\det(S^{-1}) \\ &= \det(B-sI) \\ &= P_B(s). \end{aligned} $$

This proves the claim. $\blacksquare$

As a simple example, consider the matrix

$$ A = \begin{bmatrix} 3 & 2 \\ -2 & -1 \end{bmatrix}. $$

The characteristic polynomial of $A$ is given by $P_A(s) = s^2-2s+1 = (s-1)^2$. Hence $\lambda=1$ is the only eigenvalue with algebraic multiplicity 2. The matrix $A- \lambda I$, where $\lambda=1$, is

$$ A-\lambda I = \begin{bmatrix} 2 & 2 \\ -2 & -2 \end{bmatrix}. $$

Since the matrix has two matching columns, the eigenspace has dimension one, with an eigenvector $v_1=(1, -1)'$. This is not enough yet, since we need two linearly independent vectors to form a basis set for $\mathbb{R}^2$. To find the missing basis vector for $\mathbb{R}^2$, we solve

$$ (A-\lambda I) v_2 = v_1. \tag{5} $$

Note that from (5), we have

$$ (A-\lambda I)^2 v_2 = (A-\lambda I) v_1 = 0. $$

So $v_2$ is really a vector in the null space of $N_2(\lambda) = N_{(A-\lambda I)^2}$!

Furthermore $v_2$ and $v_1$ must be linearly independent. If not, then $v_2$ must also be in the nullspace, which is impossible. In the example above, we solve

$$ \begin{bmatrix} 2 & 2 \\ -2 & -2 \end{bmatrix}v_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix}, $$

which yields a solution $v_2= (1/2,0)'$. Finally we have a basis of $\mathbb{F}^2$ consisting of

$$ \mathcal{B} = \{(1, -1)', (1, 0)'\}. $$

The matrix $A$ expressed in terms of this new basis becomes

$$ [A]_{\mathcal{B}}^{\mathcal{B}} = \big[[Av_1]_{\mathcal{B}} \st [Av_2]_{\mathcal{B}} \big]. $$

Since $v_1$ is a `genuine eigenvector`

, the first column is

$$ [Av_1]_{\mathcal{B}} = (\lambda, 0)' = (1,0)'. $$

The second column can be obtained by noting that $Av_2 = v_1 + \lambda v_2$, so

$$ [Av_2]_{\mathcal{B}} = (1, \lambda)' = (1, 1)'. $$

Hence, $A$ can be factorized into

$$ \begin{aligned} A &= \begin{bmatrix} 3 & 2 \\ -2 & -1 \end{bmatrix} \\ &= \begin{bmatrix} v_1 & v_2 \end{bmatrix} \begin{bmatrix} \lambda & 1 \\ 0 & \lambda \end{bmatrix} \begin{bmatrix} v_1 & v_2 \end{bmatrix}^{-1} \\ &=\begin{bmatrix} 1 & 1 \\ -1 & 0 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} 0 & -1 \\ 1 & 1 \end{bmatrix}. \end{aligned} $$

This method can be generalized.

A vector $v \in \mathbb{C}^n$ is called a `generalized eigenvector`

of a square matrix $A$ with eigenvalue $\lambda$ if

$$(A-\lambda I)^m v=0, \text{ for some } m\in \mathbb{Z}_+.$$

Since $Ax=0$ implies that $A^2x=0$, and so on, we have that

$$ N_1(\lambda) \subseteq N_2(\lambda) \subseteq \cdots $$

By the rank nullity theorem, the dimension of the null space of $(A-\lambda I)^m$ is bounded above by $n$, which is its dimension. So eventually, this chain of subspaces must reach a point beyond which it will cease to grow in dimension. That is, there exists a positive integer $d$ as the smallest power such that

$$ N_1(\lambda) \subsetneq \cdots \subsetneq N_d(\lambda) = N_{d+1}(\lambda) = \cdots. $$

We call this $d$ the `index`

of $\lambda$, and the subspace $N_d(\lambda) = N_{(A-\lambda I)^d}$ contains all generalized eigenvectors. We have the following important theorem.

Spectral TheoremLet $A$ be a complex valued matrix with distinct eigenvalues $\lambda_1,..., \lambda_k$. Then$$ \mathbb{C}^n = N_{d_1}(\lambda_1) \oplus N_{d_2}(\lambda_2) \oplus \cdots \oplus N_{d_k}(\lambda_k), \tag{6}$$

where $d_i$ is the index of $\lambda_i$.

The `Jordan normal form`

of a matrix can be constructed as follows. First, we fix an eigenvalue $\lambda$ of $A$. To simply the notation, we write $N_i := N_i(\lambda)$.

LemmaThe matrix $A-\lambda I$ induces an injective map between quotient spaces:$$ \begin{aligned} \phi : N_{i+1}\big/ N_i & \to N_i \big/ N_{i-1} \\ x+N_i & \to (A-\lambda I)x + N_{i-1} \end{aligned} \tag{7} $$

*Proof* For any $x\in N_{i+1}$, we have $(A-\lambda I)^{i+1} x = 0$, which can be rewritten as

$$ (A-\lambda I)^i [(A-\lambda I)x] = 0 \implies (A-\lambda I)x \in N_i. $$

This shows that the function maps to the right space. Now for any $x+N_i = y+N_i$ in the quotient space $N_{i+1}\big/N_i$, we have that $y=x+t$ for some $t\in N_i$. Hence

$$ (A-\lambda I)y = (A-\lambda I)(x+t) = (A-\lambda I)x \quad (\text{mod } N_{i-1}), $$

since $(A-\lambda I)t \in N_{i-1}$ by the same argument as before. This shows that $\phi$ is well-defined.

To show that $\phi$ is injective, take an element $x+N_i \in \text{Ker}(A-\lambda I)$. Then we must have that $(A-\lambda I)x \in N_{i-1}$, or

$$ (A-\lambda I)^{i-1}[(A-\lambda I)x] = 0 \implies x \in N_i, $$

so $x+N_i = 0 + N_i$ in the domain $N_{i+1}\big/N_i$, which proves injectivity. $\blacksquare$

Take an element $x_1^{(1)} \in N_{d}\setminus N_{d-1}$, which is non-empty by the definition of the index $d$. This element $x_1^{(1)}$ naturally corresponds to the element $x_1^{(1)}+N_{d-1}$ in the quotient space $N_d \big/ N_{d-1}$. Applying the map $\phi$ to it, we get

$$ \phi(x_1^{(1)}+N_{d-1}) = (A-\lambda I)x_1^{(1)} + N_{d-2}. $$

Since $x_1^{(1)} \notin N_{d-1}$, we must have that the product $(A-\lambda I)x_1^{(1)} \notin N_{d-2}$. Define $x_1^{(2)} := (A-\lambda I)x_1^{(1)}$, so that $x_1^{(2)} \in N_{d-1}\setminus N_{d-2}$. Repeat this process $d$ times until we reach $x_1^{(d)} \in N_1$, which is a genuine eigenvector, at which the process stops. The outcome is a chain:

$$ x_1^{(1)} \to x_1^{(2)} \to x_1^{(3)}\to \cdots \to x_1^{(d)}, $$

from which we can define a basis

$$ \mathcal{B}_1 = \{x_1^{(d)}, x_1^{(d-1)},...,x_1^{(1)}\}. \tag{8} $$

The size of $\mathcal{B}_1$ is $d$, which corresponds to the index of the eigenvalue $\lambda$. Moreover, since $(A-\lambda I)x_1^{(i)} = x_1^{(i+1)}$, we have

$$ Ax_1^{(i)} = x_1^{(i+1)} + \lambda x_1^{(i)}, \tag{9} $$

$\mathcal{B}_1$ spans a $d$-dimensional `invariant subspace`

under $A$. If we extend $\mathcal{B}_1$ to a basis $\mathcal{B}$ for the whole space, then by (9), the top left $d\times d$ block of the new matrix $[A]_{\mathcal{B}}^{\mathcal{B}}$ must be of the form

$$ \begin{bmatrix} \lambda & 1 & 0 & \cdots & 0 \\ 0 & \lambda & 1 & \cdots & 0 \\ 0 & 0 & \lambda & \cdots & 0 \\ \vdots & \vdots &\vdots & \ddots & 1 \\ 0 & 0 & 0 & \cdots & \lambda \end{bmatrix} \tag{10} $$

This is called a `Jordan block`

corresponding to the chain starting at $x_1$. Now we start from a basis for $N_d \setminus N_{d-1}$ (which is also a subspace):

$$ \mathcal{C}_d = \{x_1^{(1)},x_2^{(1)},...,x_{l_d}^{(1)}\}. $$

Apply the injective map $A-\lambda I$ to each element, to obtain a set of linearly independent vectors $x_1^{(2)},x_2^{(2)},...,x_{l_d}^{(2)}$. Extend this to a basis for $N_{d-1}\setminus N_{d-2}$:

$$ \mathcal{C}_{d-1} = \{x_1^{(2)},x_2^{(2)},...,x_{l_d}^{(2)}, ...,x_{l_{d-1}}^{(2)}\}. $$

Repeat this process $d$ times, we obtain the following diagram.

$$ \begin{array} \mathcal{C}_d & \to & \mathcal{C}_{d-1} & \to & \mathcal{C}_{d-2} & \to & \cdots & \to & \mathcal{C}_1 \\ x_1^{(1)} & \to & x_1^{(2)} & \to & x_1^{(3)} & \to & \cdots &\to & x_1^{(d)} \\ x_2^{(1)} & \to & x_2^{(2)} & \to & x_2^{(3)} & \to & \cdots &\to & x_2^{(d)} \\ \vdots & & \vdots& & \vdots& &\vdots & & \vdots \\ x_{l_d}^{(1)} & \to & x_{l_d}^{(2)} &\to &x_{l_d}^{(3)}& \to &\cdots & \to &x_{l_d}^{(d)}\\ &&\vdots&&\vdots&&\vdots&&\vdots\\ &&x_{l_{d-1}}^{(2)}&\to&x_{l_{d-1}}^{(3)}&\to&\cdots&\to&x_{l_{d-1}}^{(d)} \\ &&&&\vdots&&\vdots&&\vdots\\ &&&&x_{l_{d-2}}^{(3)}&\to&\cdots&\to&x_{l_{d-2}}^{(d)}\\ &&&&&&\vdots&&\vdots \\ &&&&&&\ddots&&\vdots \\ &&&&&&&&x_{l_1}^{(d)}, \end{array} \tag{11} $$

where $\mathcal{C}_1$ is a basis for the eigenspace $N_1$, and $\mathcal{C}_i$ is a basis for $N_{i} \setminus N_{i-1}$, $i=2,...,d$.

From this diagram, we can make the following observations:

- The length of the first row is the index of $\lambda$, which is $d$.
- Each row corresponds to an invariant subspace of $\mathbb{C}^n$ and a Jordan block.
- The number of rows is equal to the geometric multiplicity of $\lambda$. The last column consists of all the genuine eigenvectors.
- The $k$ right columns form a basis for $N_k(\lambda) = N_{(A-\lambda I)^k}$.
- The entire array consists of all the generalized eigenvectors, which form a basis for $N_d(\lambda)$.

In general, if a matrix has eigenvalues $\lambda_1,...,\lambda_k$. For each $\lambda_i$, we find all of its generalized eigenvectors as in (11). Let $J_i$ represent the block corresponding to the $i$th eigenvalue $\lambda_i$. Let $J_{ij}$ denote the Jordan block corresponding to the $j$th row of the array for the $i$th eigenvalue $\lambda_i$. The matrix $A$ can now be "diagonalized" into the `Jordan normal form`

:

$$ J(A) = \begin{bmatrix} J_{1} & \\ & J_{2} & \\ & & \ddots & \\ & & & J_{k} \end{bmatrix}, \quad J_i = \begin{bmatrix} J_{i1} & \\ & J_{i2} & \\ & & \ddots & \\ & & & J_{i,l_i} \end{bmatrix}, $$

where $l_i$ is the geometric multiplicity of $\lambda_i$. The columns of the change of basis matrix $S$ consists of all the vectors in the array (11) for all the eigenvalues arranged in the following order:

$$ S = \big[ x_1^{(d)}\st \cdots \st x_1^{(1)} \st x_2^{(d-1)} \st \cdots \big]. $$

The construction of the Jordan normal form leads to the following result.

TheoremTwo matrices $A$ and $B$ are similar if and only if they have the same eigenvalues and for each eigenvalue $\lambda$ and for each $k=1,2,...$,$$\dim N_{(A-\lambda I)^k} = \dim N_{(B-\lambda I)^k}.$$

*Proof* $(\Rightarrow)$ If $A$ and $B$ are similar, then $A=SBS^{-1}$ for some invertible matrix $S$. Then by the binomial theorem,

$$ \begin{aligned} (A-\lambda I)^k &= \sum_{i=1}^k \binom{n}{i}(-\lambda)^i A^i \\ &= \sum_{i=1}^k \binom{n}{i}(-\lambda)^i (SBS^{-1})^i \\ &= S\left[\sum_{i=1}^k \binom{n}{i}(-\lambda)^i B^i\right]S^{-1} \\ &= S(B-\lambda I)^k S^{-1}. \end{aligned} $$

Hence $(A-\lambda I)^k$ and $(B-\lambda I)^k$ are similar, so their null spaces have equal dimensions.

$(\Leftarrow)$ If $A$ and $B$ have the same eigenvalues as well as the same generalized eigenspaces, then they must have the same spectral decomposition. Hence they are similar to the same Jordan normal form. $\blacksquare$

A natural consequence to this theorem is the following.

CorollaryEvery square matrix is similar to its transpose.

*Proof* $A$ and $A\tr$ have the same characteristic polynomial since

$$ \det(A\tr-\lambda I) = \det(A-\lambda I)\tr = \det(A-\lambda I). $$

Also since the matrices $(A-\lambda I)^k$ and $(A\tr - \lambda I)^k$ are transposes of each other, both square matrices have the same rank and they must have the same nullity. Hence the result follows by the previous theorem. $\blacksquare$