Here, we have two vectors, $a, b \in \mathbb{R}^m$. They exist in the plane deﬁned by $\text{Span}({a, b})$ which is a two dimensional space (unless a and b point in the same direction).
$b = z + w$
$z = \chi a \text{ with } \chi \in \mathbb{R}$
$a^T w = 0$
$0 = a^T w = a^T(b - z) = a^T (b - \chi a)$
$a^T a \chi = a^T b$.
Provided $a \ne 0$, $\chi = (a^T a)^{-1}(a^T b)$.
Thus, the component of $b$ in the direction of $a$ is given by
$z = \chi a = (a^T a)^{-1} (a^T b) a = a(a^T a)^{-1}(a^T b) = [a(a^T a)^{-1}a^T ] b = [\frac{1}{a^T a} a a^T ] b$
$a(a^T a)^{-1}a^T = \frac{1}{a^T a} a a^T$
The component of $b$ orthogonal (perpendicular) to $a$ is given by
$w = b - z = b - (a(a^T a)^{-1}a^T ) b = Ib - (a(a^T a)^{-1}a^T )b = (I - a(a^T a)^{-1}a^T )b$
$I - a(a^T a)^{-1}a^T = I - \frac{1}{a^T a} a a^T$
Set $v^T = (a^T a)^{-1}a^T$,
Given $a, x \in \mathbb {R}^ m$, we can use $P_ a( x )$ and $P_ a ^{\perp}( x )$ to represent the projection of vector $x$ onto ${\rm Span}(\{ a\} )$ and ${\rm Span}(\{ a\} )^{\perp}$.
Given $A \in \mathbb{R}^{m \times n}$ with linearly independent columns and vector $b \in \mathbb{R}^m$ :
Given $A \in \mathbb{R}^{m \times n}$ with linearly independent columns, there exists a matrix $Q \in \mathbb{R}^{m \times n}$ with mutually orthonormal columns and upper triangular matrix $R \in \mathbb{R}^{n \times n}$ such that $A = QR$. The vector $\hat{x}$ that is the best solution (in the linear least-squares sense) to $Ax \approx b$ is given by
An algorithm for computing the QR factorization is given by
Any matrix $A \in \mathbb{R}^{m \times n}$ can be written as the product of three matrices, the Singular Value Decomposition (SVD): $$A = U \Sigma V^T$$ where
If we partition
where $U_L$ and $V_L$ have $k$ columns and $\Sigma_{TL}$ is $k \times k$, then $U_L \Sigma_{TL} V_L^T$ is the “best” rank-k approximation to matrix B. So, the “best” rank-k approximation $B = AW^T$ is given by the choices $A = U_L$ and $W = \Sigma_{TL} V_L$.
$\begin{aligned}\hat{x} &= (A^TA)^{-1}A^T b \\ &= ((U \Sigma V^T)^T U \Sigma V^T)^{-1} (U \Sigma V^T)^T b \\ &= V \Sigma^{-1} U^T b \end{aligned}$