April. 11, 2019 $\newcommand{\bs}{\boldsymbol}$ $\newcommand{\argmin}[1]{\underset{\bs{#1}}{\text{arg min}}\,}$ $\newcommand{\argmax}[1]{\underset{\bs{#1}}{\text{arg max}}\,}$ $\newcommand{\tr}{^{\top}}$ $\newcommand{\norm}[1]{\left|\left|\,#1\,\right|\right|}$ $\newcommand{\given}{\,|\,}$ $\newcommand{\st}{\,\big|\,}$ $\newcommand{\E}[1]{\mathbb{E}\left[#1\right]}$ $\newcommand{\P}[1]{\mathbb{P}\left(#1\right)}$ $\newcommand{\abs}[1]{\left|#1\right|}$ $\newcommand{\blue}[1]{{\color{blue} {#1}}}$ $\newcommand{\red}[1]{{\color{red} {#1}}}$ $\newcommand{\orange}[1]{{\color{orange} {#1}}}$ $\newcommand{\pfrac}[2]{\frac{\partial #1}{\partial #2}}$

For a general vector space, we have the concept of duality. If $X$ is a vector space with a basis $x_1,..., x_n$, there exists the corresponding dual basis $l_1,...,l_n$ in the dual space $X'$ that satisfies

$$ l_j(x_i) =: (l_j, x_i) = \begin{cases} 1 & i=j \\ 0 & i\neq j \end{cases},\tag{1} $$

where $(\cdot, \cdot)$ is a bilinear form that is not symmetric in general. As a result of (1), any element $x\in X$ can be written as a linear combination whose coefficients are expressed in terms of the dual basis:

$$ x = \sum_{i=1}^n a_i x_i. \tag{2} $$

Applying $l_j$ to $x$ for every $j$, we conclude that

$$ a_i = (l_i, x). $$

Hence, we have the isomorphism $X \cong X'$ via the mapping $x_i \mapsto l_i$. This isomorphism depends on the choice of the basis $x_1,...,x_n$. If we go a step further, the dual of the dual $X''$ is isomorphic to $X$ in a way that is independent of the choice of the basis. For any $l'\in X''$, we can choose a corresponding $x\in X$, such that for any $l\in X'$,

$$ (l', l) = (l, x). \tag{2'} $$

So the element $l' \in X'$ is uniquely `represented`

by the element $x\in X$, leading to a natural identification $X''=X$. This idea of representation will appear again. Let $T:X\to U$ be a linear map. Its `adjoint map`

$T':U'\to X'$ is defined so that the value $T'l\in X'$ satisfies the relation

$$ (T'l, x) = (l, Tx), \tag{3} $$

for any $l\in U'$ and $x\in X$. This is illustrated in the following diagram.

Let $Y$ be a subspace of $X$. Recall that the `annihilator`

of $Y$ is defined by

$$ Y^\perp := \{l\in X' \st l(y)=0 \text{ for all } y\in Y\}. $$

Previously we have proved that

$$ \begin{aligned} R_T^\perp &= N_{T'} \\ R_T &= N_{T'}^\perp. \end{aligned} \tag{3'} $$

The Euclidean structure is furnished by defining a real-valued function of two vectors, called the `scalar product`

, denoted as $(x, y)$, which has the following properties:

- (Bilinear) $(x,y)$ a bilinear form.
- (Symmetric) $(x,y) = (y,x)$.
- (Positive) $(x,x)\geq 0$ and the equality holds iff $x=0$.

A vector space equipped with scalar product is called an `inner product space`

. In an inner product space, we can talk about the notion of length. Define the Euclidean `norm`

by

$$ \norm{x} = (x,x)^{1/2}, \tag{4} $$

which can be shown to satisfies the three properties which define a norm:

- (Triangle inequality) $\norm{x+y} \leq \norm{x} + \norm{y}$
- (Homogeneous) $\norm{ax} = |a|\norm{x}$
- (Positive) $\norm{x} = 0$ iff $x= 0$.

Using the three properties of scalar product and (4), we can write

$$ \norm{x-y}^2 = \norm{x}^2 - 2(x,y) + \norm{y}^2, \tag{5} $$

which shows that $x$ and $y$ satisfies the Pythagoream identity if and only if $(x,y)=0$. In this case, $x$ and $y$ are called `orthogonal`

. Let $X$ be a finite-dimensional linear space with the Euclidean structure. A set of basis $x_1,...,x_n$ are called `orthonormal`

if

$$ (x_i, x_j) = \begin{cases} 1 & i=j \\ 0 & i\neq j. \end{cases}\tag{6} $$

It can be shown, using the `Gram-Schmidt`

process, that every finite-dimensional linear space with basis $y_1,...,y_n$ has an orthonormal basis $x_1,...,x_n$, formed recursively by

$$ x_k = c\left(y_k-\sum_{j=1}^{k-1} c_jx_j \right), $$

where $c_j = (y_k, x_j)$ and $c$ is the normalizing constant to ensure $\norm{x_k}=1$.

We love the orthonormal basis because if $x=\sum a_i x_i$ and $y=\sum b_i x_i$, where $x_1,...,x_n$ is an orthonormal basis, then the scalar product between $x$ and $y$ becomes

$$ (x,y) = \sum_{i=1}^n \sum_{j=1}^n a_i b_j (x_i, x_j) = \sum_{i=1}^n a_i b_i, $$

which is the usual dot product in $\mathbb{R}^n$ provided that the orthonormal basis is interpreted as the Cartesian coordinate system of $\mathbb{R}^n$. Since the scalar product is independent from the choice of basis, we can choose a special Cartesian coordinate system so that the first axis passes through $x$, and $y$ is contained in the plane spanned by the first two axes. Under this coordinate system, $x$ and $y$ can be expressed as

$$ \begin{aligned} x &= (\norm{x}, 0, ..., 0) \\ y &= (\norm{y}\cos\theta, ... ), \end{aligned} $$

so their dot product becomes

$$ (x,y) = \norm{x}\norm{y} \cos\theta. \tag{7} $$

This agrees with the idea of orthogonality in $\mathbb{R}^n$: $x$ and $y$ are orthogonal iff $\theta=0$. In addition, we can rewrite (5) as the classical `law of cosines`

:

$$ \norm{x-y}^2 = \norm{x}^2 + \norm{y}^2 - 2\norm{x}\norm{y}\cos\theta. $$

Let $x_1,...,x_n$ be an orthonormal basis of an inner product space $X$, so we can write $x$ as a linear combination:

$$ x = \sum_{i=1}^n a_i x_i. \tag{8} $$

Taking the inner product of $x$ with each of the basis element $x_i$, we have that

$$a_i = (x, x_i).$$

This is essentially the same as (2), except now we have an orthonormal basis instead of the dual basis. Equation (8) shows that the mapping

$$ x \mapsto (a_1,...,a_n) $$

carries the inner product space $X$ to a Euclidean structure $\mathbb{R}^n$, with all of its glorious properties. Furthermore, the similarily between (2) and (8) shows that the orthonormal basis is essentially the same as its dual basis, which brings the following theorem.

Theorem 1Let $X$ be an inner product space. Every linear functional $l\in X'$ can be represented as a scalar product$$ l(x) = (x,y)$$

for some element $y\in X$. Moreover, this $y$ can be written in terms of the basis:

$$ y = \sum_{i=1}^n l(x_i) x_i. \tag{9}$$

*Proof* Pick an orthonormal basis $x_1,...,x_n$ in $X$, and associate it with a dual basis $l_1,...,l_n$ in $X'$. Under the canonical isomorphism between $X$ and $X''$, the basis $x_1,...,x_n$ is also a basis for $X''$, so they are the dual basis for $X'$. Hence, by (2) and (2'), the linear functional $l$ can be expressed as

$$ l=\sum_{i=1}^n (x_i, l)l_i = \sum_{i=1}^n l(x_i)l_i. $$

By (8), we can express $x$ as

$$ x = \sum_{j=1}^n (x, x_j) x_j. $$

Taking the scalar product, we have that

$$ \begin{aligned} l(x) &= \left(\sum_{i=1}^n l(x_i) l_i, \sum_{j=1}^n (x, x_j)x_j\right) \\ &= \sum_{i=1}^n l(x_i)(x, x_i) \\ &= \left(x, \sum_{i=1}^n l(x_i) x_i\right). \end{aligned} \tag{9'} $$

Set $y=\sum_{i=1}^n l(x_i)x_i$, the theorem is proved. $\blacksquare$

Theorem 1 is also known as the `Riesz representation theorem`

. It implies the following important result.

Corollary 1The mapping $l\mapsto y$ defines a canonical isomorphism between the Euclidean space $X$ and its dual $X'$.

Let $X$ be a finite-dimensional inner product space, and let $Y$ be a subspace of $X$. We define the `orthogonal complement`

of $Y$, denoted as $Y^\perp$ by

$$ Y^\perp := \{z\in X \st (z, y)=0 \text{ for all } y\in Y\}. $$

Under the canonical isomorphism (Theorem 1), the orthogonal complement is consistent with the definition of the annihilator, since the element $z \in X$ corresponds to $l\in X'$.

Let $T:X\to U$ be a linear map where $X$ and $U$ are inner product space. Given $u\in U$, we define the adjoint of $T$ as $T^*:U\to X$ by

$$ T^*u := \sum_{i=1}^n (Tx_i, u) x_i, \tag{10} $$

where $x_1,...,x_n$ is an orthonormal basis of $X$. This is well defined since the right hand side is a linear function of $x$. In particular, since $(Tx, u)$ is a linear function of $x$, by the Riesz representation theorem, we have that

$$ (Tx, u) = (x, T^*u). \tag{11} $$

Note that (11) and (10) can both be used to define the adjoint map $A^*$. Equation (11) is very similar to Equation (3), where under the canonical isomorphism, $l$ is replaced with $x$, and $T'$ is replaced with $T^*$. Using the symmetry of scalar product, we can show that $T^{**}=T$. Using (11), we can show the analagous results to (3') as

Theorem 2Let $X, U$ be inner product spaces, and let $T:X\to U$ be a linear map. Then the following is true.$$ \begin{aligned} R_T^\perp &= N_{T^*} \\ R_T &= N_{T^*}^\perp. \end{aligned} \tag{12} $$

*Proof* Since $R_T^\perp$ is a subspace of $U$, by (11), we have

$$ \begin{aligned} R_T^\perp &= \{z \in U \st (z, u)=0 \text{ for all } u\in R_T\} \\ &= \{z\in U \st (z, Tx) = 0 \text{ for all } x \in X\} \\ &= \{z\in U \st (T^*z, x)=0 \text{ for all } x\in X\} \\ &= \{z\in U \st T^*z=0\} & [\text{Take } x=T^*z]\\ &= N_{T^*}. \end{aligned} $$

This proves the first equality. Taking $\perp$ on both sides, and using the fact that $R_T^{\perp\perp}=R_T$, we have proved the second equality. $\blacksquare$

Theorem 2 shows that if $T$ is a matrix, then the column space $R_T$ is the orthogonal complement to the left null space $N_{T^*}$. Since $T^{**}=T$, the row space $R_{T^*}$ is the orthogonal complement of the null space $N_T$.

Theorem 3Let $A:\mathbb{R}^n \to \mathbb{R}^m$ be a linear map, where $\mathbb{R}^n$ and $\mathbb{R}^n$ have their standard Euclidean structures. If we interpret $A$ and $A^*$ as matrices, they are`transposes`

of each other. That is $A_{ij} = A^*_{ji}$.

*Proof* Let $u=\bs{e}^m_j \in \mathbb{R}^m$ be the $j$th standard basis for $\mathbb{R}^m$. Let $\bs{e}^n_i\in \mathbb{R}^n$ be the $i$th standard basis for $\mathbb{R}^n$. Using Equation (10), we have

$$ \begin{aligned} A^*\bs{e}^m_j &= \sum_{i=1}^n (A\bs{e}^n_i, \bs{e}^m_j) \bs{e}^n_i, \\ &= \sum_{i=1}^n A_{ji} \bs{e}_i \\ &= (A_{j1}, A_{j2},...,A_{jn})'. \end{aligned} $$

The left hand side of is the $j$th column of the matrix $A^*$, which is precisely the $j$th row of $A$. $\blacksquare$