Week 11 - Orthogonal Projection, Low Rank Approximation, and Orthogonal Bases

1. Rank-k Approximation

1.1. Projecting a Vector onto a Subspace

  • Here, we have two vectors, a,bRma, b \in \mathbb{R}^m. They exist in the plane defined by Span(a,b)\text{Span}({a, b}) which is a two dimensional space (unless a and b point in the same direction).
  • b=z+wb = z + w
  • z=χa with χRz = \chi a \text{ with } \chi \in \mathbb{R}
  • aTw=0a^T w = 0
  • 0=aTw=aT(bz)=aT(bχa)0 = a^T w = a^T(b - z) = a^T (b - \chi a)
  • aTaχ=aTba^T a \chi = a^T b.
  • Provided a0a \ne 0, χ=(aTa)1(aTb)\chi = (a^T a)^{-1}(a^T b).
  • Thus, the component of bb in the direction of aa is given by z=χa=(aTa)1(aTb)a=a(aTa)1(aTb)=[a(aTa)1aT]b=[1aTaaaT]bz = \chi a = (a^T a)^{-1} (a^T b) a = a(a^T a)^{-1}(a^T b) = [a(a^T a)^{-1}a^T ] b = [\frac{1}{a^T a} a a^T ] b
    • Notice (aTa)1(a^Ta)^{-1} and aTba^T b are both scalars.
    • We say that, given vector aa, the matrix that projects any given vector bb onto the space spanned by aa is given by a(aTa)1aT=1aTaaaTa(a^T a)^{-1}a^T = \frac{1}{a^T a} a a^T
  • The component of bb orthogonal (perpendicular) to aa is given by w=bz=b(a(aTa)1aT)b=Ib(a(aTa)1aT)b=(Ia(aTa)1aT)bw = b - z = b - (a(a^T a)^{-1}a^T ) b = Ib - (a(a^T a)^{-1}a^T )b = (I - a(a^T a)^{-1}a^T )b
    • We say that, given vector aa, the matrix that projects any given vector bb onto the space spanned by aa is given by Ia(aTa)1aT=I1aTaaaTI - a(a^T a)^{-1}a^T = I - \frac{1}{a^T a} a a^T
  • Set vT=(aTa)1aTv^T = (a^T a)^{-1}a^T,
    • z=(avT)bz = (a v^T) b
    • w=(IavT)bw = (I - a v^T) b
    • Notice (IavT)(I - a v^T) is a rank-1 update to the identity matrix.
  • Given a,xRma, x \in \mathbb {R}^ m, we can use Pa(x)P_ a( x ) and Pa(x)P_ a ^{\perp}( x ) to represent the projection of vector xx onto Span({a}){\rm Span}(\{ a\} ) and Span({a}){\rm Span}(\{ a\} )^{\perp}.

  • Given ARm×nA \in \mathbb{R}^{m \times n} with linearly independent columns and vector bRmb \in \mathbb{R}^m :

    • Component of bb in C(A)\mathcal{C}(A): u=A(ATA)1ATbu = A(A^TA)^{-1} A^Tb
    • Matrix that projects onto C(A)\mathcal{C}(A): A(ATA)1ATA(A^TA)^{-1}A^T
    • Component of bb in C(A)=N(AT)\mathcal{C}(A)^{\perp} = \mathcal{N}(A^T): w=bA(ATA)1ATb=(IA(ATA)1AT)bw = b - A(A^TA)^{-1}A^T b = (I - A(A^TA)^{-1}A^T) b
    • Matrix that projects onto C(A)=N(AT)\mathcal{C}(A)^{\perp} = \mathcal{N}(A^T): (IA(ATA)1AT)(I - A(A^TA)^{-1}A^T)

1.2. Rank-k Approximation

  • "Best" rank-k approximation of BRm×nB \in \mathbb{R}^{m \times n} using the column space of ARm×kA \in \mathbb{R}^{m \times k} (pick kk columns in B to get A) with linearly independent columns: A(ATA)1ATB=AV, where V=(ATA)1ATBA(A^TA)^{-1}A^TB = AV, \text{ where } V = (A^TA)^{-1}A^TB
    • To calculate V=(ATA)1ATBV = (A^TA)^{-1}A^TB
    • First way is, to use LU factorization:
      • (ATA)V=(ATA)(ATA)1ATB(A^TA)V = (A^TA)(A^TA)^{-1}A^TB
      • (ATA)V=ATB(A^TA)V = A^TB
      • solve C=ATAC = A^TA and Y=ATBY = A^TB separately, then solve CV=Y C V = Y by LU factorization.
    • Second way is, to use Cholesky factorization:
      • (ATA)V=ATB(A^TA)V = A^TB
      • Since ATAA^TA is a symmetric positive definite(SPD) matrix. Then, we can transfer it to LLT=ATALL^T = A^TA
      • LLTV=ATBLL^TV = A^TB
      • set U=LTVU = L^TV
      • solve Y=ATBY = A^TB
      • Then solve LU=YLU = Y, to get UU.
      • solve LTV=UL^TV = U, to get VV.

2. Orthonormal Bases

2.1. Orthonormal Vectors

  • Definition: Let q0,q1,,qk1Rmq_0, q_1, \ldots, q_{k-1} \in \mathbb{R}^m. Then these vectors are (mutally) orthonormal if for all 0i,j<k0 \le i,j < k: qiTqj={1 if i=j0 otherwise. q_i^T q_j = \begin{cases} 1 & \text{ if } i = j \\ 0 & \text{ otherwise. }\end{cases}
    • qi2=1\lVert q_i \rVert_2 = 1
    • qiq_i is orthogonal to {q0,q1,,qi1,qi+1,,qm}\{q_0, q_1, \ldots, q_{i-1}, q_{i+1}, \ldots, q_{m}\}.

2.2. Gram-Schmidt orthogonalization (GS orthogonalization)

  • Definition: Transform a given set of basis vectors into a set of orthonormal vectors that form a basis for the same space.
  • Starting with linearly independent vectors a0,a1,,an1Rma_0, a_1, \ldots, a_{n-1} \in \mathbb{R}^m, the following algorithm computes the mutually orthonormal vectors q0,q1,,qn1Rmq_0, q_1, \ldots, q_{n-1} \in \mathbb{R}^m such that Span({a0,a1,,an1})=Span({q0,q1,,qn1})\text{Span}(\{a_0, a_1, \ldots, a_{n-1}\}) = \text{Span}(\{q_0, q_1, \ldots, q_{n-1}\}):
  • The key of this transformation is:
    • We are trying to find the orthogonal aia_i^{\perp} to the vectors q0,q1,,qi1q_0, q_1, \ldots, q_{i-1}, then divided by the length of aia_i ( i.e. ai2\lVert a_i^{\perp} \rVert _2).
      • ai=aiq0Taiq0q1Taiq1qi1Taiqi1a_i^{\perp} = a_i - q_0^T a_i q_0 - q_1^T a_i q_1 - \ldots - q_{i-1}^T a_i q_{i-1}
      • =aiρ0,iq0ρ1,iq1ρi1,iqi1= a_i - \rho_{0,i} q_0 - \rho_{1,i} q_1 - \ldots - \rho_{i-1,i} q_{i-1}
  • Notice:
    • ρ0,0=a02,q0=a0/ρ0,0\rho_{0,0} = \lVert a_0 \rVert _2, q_0 = a_0 / \rho_{0,0}
      • Notice that Span({a0})=Span({q0})\text{Span}(\{a_0\}) = \text{Span}(\{q_0\}) since q0q_0 is simply a scalar multiple of a0a_0.
    • ρ0,1=(q0Tq0)1q0Ta1=q0Ta1\rho_{0,1} = (q_0^T q_0)^{-1} q_0^T a_1 = q_0^T a_1
      • qiq_i is orthonormal vector, so (q0Tq0)1=1(q_0^T q_0)^{-1} = 1
      • ρ0,1\rho_{0,1} is like χ\chi in the image of the first figure of this week.
    • ρ0,2=q0Ta2\rho_{0,2} = q_0^T a_2
    • ρ1,1=a12,q1=a1/ρ1,1\rho_{1,1} = \lVert a_1^{\perp} \rVert _2, q_1 = a_1^{\perp} / \rho_{1,1}
      • a1=a1ρ0,1q0a_1^{\perp} = a_1 - \rho_{0,1} q_0
    • ρ1,2=q1Ta2\rho_{1,2} = q_1^T a_2

2.3. The QR factorization

  • Given ARm×nA \in \mathbb{R}^{m \times n} with linearly independent columns, there exists a matrix QRm×nQ \in \mathbb{R}^{m \times n} with mutually orthonormal columns and upper triangular matrix RRn×nR \in \mathbb{R}^{n \times n} such that A=QRA = QR.
  • If one partitions
  • then
  • and Gram-Schmidt orthogonalization (the Gram-Schmidt process) in the above algorithm computes the columns of Q and elements of R.

2.4. Solving the linear least-squares problem via the QR factorization

  • Given ARm×nA \in \mathbb{R}^{m \times n} with linearly independent columns, there exists a matrix QRm×nQ \in \mathbb{R}^{m \times n} with mutually orthonormal columns and upper triangular matrix RRn×nR \in \mathbb{R}^{n \times n} such that A=QRA = QR. The vector x^\hat{x} that is the best solution (in the linear least-squares sense) to AxbAx \approx b is given by

    • x^=(ATA)1ATb\hat{x} = (A^T A)^{-1} A^T b (as shown in Week 10) computed by solving the normal equations ATAx=ATbA^TAx = A^T b
    • x^=R1QTb\hat{x} = R^{-1} Q^T b computed by solving Rx=QTbRx = Q^Tb
      • Notice QTQ=IQ^T Q = I and RR is upper trianglar.
      • And Columns of A must be linear independent.
  • An algorithm for computing the QR factorization is given by

Change of Basis

  • A vector bRmb \in \mathbb{R}^m and a matrix QRm×nQ \in \mathbb{R}^{m \times n} with mutually orthonormal columns.
  • QTQ=QQT=Q1Q=QQ1=IQ^T Q = Q Q^T = Q^{-1} Q = Q Q^{-1} = I
  • b=QQTb=(q0q1qn1)(q0Tq1Tqn1T)b=q0Tbq0+q1Tbq1++qi1Tbqi1b = Q Q^T b = \left(\begin{array}{c|c|c}q_0 & q_1 & \cdots & q_{n-1}\end{array}\right)\left(\begin{array}{c}q_0^T \\ q_1^T \\ \vdots \\ q_{n-1}^T\end{array}\right) b = q_0^T b q_0 + q_1^T b q_1 + \ldots + q_{i-1}^T b q_{i-1}
    • q0Tbq0=q0q0Tbq_0^T b q_0 = q_0 q_0^T b because q0Tbq_0^T b is scalar.
    • notice that each of these terms is just a component of the vector bb in the direction of the given basis vector.

2.5. Singular Value Decomposition

  • Any matrix ARm×nA \in \mathbb{R}^{m \times n} can be written as the product of three matrices, the Singular Value Decomposition (SVD): A=UΣVTA = U \Sigma V^T where

    • URm×rU \in \mathbb{R}^{m \times r} and UTU=IU^T U = I (U has orthonormal columns).
    • ΣRr×r\Sigma \in \mathbb{R}^{r \times r} is a diagonal matrix with positive diagonal elements that are ordered so that σ0,0σ1,1σ(r1),(r1)>0\sigma_{0,0} \ge \sigma_{1,1} \ge \ldots \ge \sigma_{(r-1),(r-1)} > 0.
    • VRn×rV \in \mathbb{R}^{n \times r} and VTV=IV^T V = I (V has orthonormal columns).
    • rr equals the rank of matrix AA.
  • If we partition

  • where ULU_L and VLV_L have kk columns and ΣTL\Sigma_{TL} is k×kk \times k, then ULΣTLVLTU_L \Sigma_{TL} V_L^T is the “best” rank-k approximation to matrix B. So, the “best” rank-k approximation B=AWTB = AW^T is given by the choices A=ULA = U_L and W=ΣTLVLW = \Sigma_{TL} V_L.
    • Given ARm×nA \in \mathbb{R}^{m \times n} with linearly independent columns, and bRmb \in \mathbb{R}^m , the “best” solution to AxbAx \approx b (in the linear least-squares sense) via its SVD, A=UΣVTA = U \Sigma V^T , is given by x^=(ATA)1ATb=((UΣVT)TUΣVT)1(UΣVT)Tb=VΣ1UTb\begin{aligned}\hat{x} &= (A^TA)^{-1}A^T b \\ &= ((U \Sigma V^T)^T U \Sigma V^T)^{-1} (U \Sigma V^T)^T b \\ &= V \Sigma^{-1} U^T b \end{aligned}

results matching ""

    No results matching ""