Week 4 - Matrix-Vector to Matrix-Matrix Multiplication

1. Preparation

1.1. Partitioned Matrix-vector Multiplication

  • (A0,0A0,1A0,N1A1,0A1,1A1,N1AM1,0AM1,1AM1,N1)(x0x1xN1)=(A0,0x0+A0,1x1++A0,N1xN1A1,0x0+A1,1x1++A1,N1xN1AM1,0x0+AM1,1x1++AM1,N1xN1)\displaystyle \left( \begin{array}{ c | c | c | c } A_{0,0} & A_{0,1} & \cdots & A_{0,N-1} \\ A_{1,0} & A_{1,1} & \cdots & A_{1,N-1} \\ \vdots & \vdots & \ddots & \vdots \\ A_{M-1,0} & A_{M-1,1} & \cdots & A_{M-1,N-1} \end{array} \right) \left( \begin{array}{ c } x_0 \\ x_1 \\ \vdots \\ x_{N-1} \end{array} \right) = \left( \begin{array}{ c } A_{0,0} x_{0} + A_{0,1} x_{1} + \cdots + A_{0,N-1} x_{N-1} \\ A_{1,0} x_{0} + A_{1,1} x_{1} + \cdots + A_{1,N-1} x_{N-1} \\ \vdots \\ A_{M-1,0} x_{0} + A_{M-1,1} x_{1} + \cdots + A_{M-1,N-1} x_{N-1} \end{array} \right)

  • Two different algorithms to calculate Matrix-vector Multiplication

    • First one, is calculating by rows.
      • ψ1:=a10Tx0+α11x1+a12Tx2+ψ1\psi_1 := a_{10}^T x_0 + \alpha_{11} x_1 + a_{12}^T x_2 + \psi_1
    • Second one is by columns.
      • y0:=χ1a01+y0y_0 := \chi_1 a_{01} + y_0
      • ψ1:=χ1α11+ψ1\psi_1 := \chi_1 \alpha_{11} + \psi_1
      • y2:=χ1a21+y2y_2 := \chi_1 a_{21} + y_2

1.2. Transposing a Partitioned Matrix

  • (A0,0A0,1A0,N1A1,0A1,1A1,N1AM1,0AM1,1AM1,N1)T=(A0,0TA1,0TAM1,0TA0,1TA1,1TAM1,1TA0,N1TA1,N1TAM1,N1T).\left( \begin{array}{c | c | c | c} A_{0,0} & A_{0,1} & \cdots & A_{0,N-1} \\ A_{1,0} & A_{1,1} & \cdots & A_{1,N-1} \\ \vdots & \vdots & & \vdots \\ A_{M-1,0} & A_{M-1,1} & \cdots & A_{M-1,N-1} \end{array} \right)^ T = \left( \begin{array}{c | c | c | c} A_{0,0}^ T & A_{1,0}^ T & \cdots & A_{M-1,0}^ T \\ A_{0,1}^ T & A_{1,1}^ T & \cdots & A_{M-1,1}^ T \\ \vdots & \vdots & & \vdots \\ A_{0,N-1}^ T & A_{1,N-1}^ T & \cdots & A_{M-1,N-1}^ T \end{array} \right).

  • Example

1.3. Matrix-Vector Multiplication with Special Matrices

  • When doing calculation, we always concern flops and memops. And later we will prove that, memops is much slower that flops.
  • Let's take an example. If we want calculate y:=ATx+yy := A^T x + y. What we are going to do are, set B=ATB = A^T, then y:=Bx+yy := B x +y
    • Which means, we are going to spend the most time to transpose matrix A to B, and little time computing.
    • What we want is, compute with ATA^T without actually transposing.
    • The solution is, we can simply use columns of A for the dot products in the dot product based algorithm for y:=Ax+yy := Ax + y.

Triangular Matrix-Vector Multiplication

  • Let URn×nU \in \mathbb{R}^{n \times n} be an upper triangular matrix and xRnx \in \mathbb{R}^n be a vector. Consider
    • We notice that uT=0u^T = 0 (a vector of two zeroes) and hence we need not compute with it.
    • Let's calculate the flops:
      • The calculate step: ψ1:=u11x1+u12Tx2+ψ1\psi_1 := u_{11} x_1 + u_{12}^T x_2 + \psi_1, (without u10Tx0u_{10}^T x_0)
      • if we set the size of U00U_{00} to kk,then the size of U02U_{02} and U20U_{20} should be nk1n - k - 1.
      • flops(u11x1u_{11} x_1) = 1, flops(u12Tx2u_{12}^T x_2) = 2(nk1)2(n-k-1), flops(u11x1+u12Tx2u_{11} x_1 + u_{12}^T x_2) = 1
      • So the flops = i=0k(2+2(nk1))=n(n+1)\sum_{i=0}^k (2 + 2(n - k - 1)) = n*(n+1)

Symmetric Matrix-Vector Multiplication

  • We purposely chose the matrix on the right to be symmetric. We notice that a10T=a01a_{10}^T = a_{01} , A20T=A02A_{20}^T = A_{02} , and a12T=a21a_{12}^T = a_{21}.
  • So we just need to change the step of calculation
    • By rows:
      • from ψ1:=a10Tx0+α11x1+a12Tx2+ψ1\psi_1 := a_{10}^T x_0 + \alpha_{11} x_1 + a_{12}^T x_2 + \psi_1
      • to ψ1:=a01x0+α11x1+a12Tx2+ψ1\psi_1 := a_{01} x_0 + \alpha_{11} x_1 + a_{12}^T x_2 + \psi_1
    • By columns:
      • from
        • y0:=χ1a01+y0y_0 := \chi_1 a_{01} + y_0
        • ψ1:=χ1α11+ψ1\psi_1 := \chi_1 \alpha_{11} + \psi_1
        • y2:=χ1a21+y2y_2 := \chi_1 a_{21} + y_2
      • to
        • y0:=χ1a01+y0y_0 := \chi_1 a_{01} + y_0
        • ψ1:=χ1α11+ψ1\psi_1 := \chi_1 \alpha_{11} + \psi_1
        • y2:=χ1(a11T)T+y2y_2 := \chi_1 (a_{11}^T)^T + y_2

2. Composing linear transformations

  • Let LA:RkRmL_ A: \mathbb {R}^ k \rightarrow \mathbb {R}^ m and LB:RnRkL_ B: \mathbb {R}^ n \rightarrow \mathbb {R}^ k both be linear transformations and, for all xRnx \in \mathbb{R}^n, define the function LC:RnRmL_ C: \mathbb {R}^ n \rightarrow \mathbb {R}^ m by LC(x)=LA(LB(x))L_ C( x ) = L_ A( L_ B( x ) ). Then LC(x)L_ C( x ) is a linear transformations.

3. Matrix-matrix multiplication

  • AB=A(b0b1bn1)=(Ab0Ab1Abn1).A B = A \left( \begin{array}{c | c | c | c } b_0 & b_1 & \cdots & b_{n-1} \end{array} \right) = \left( \begin{array}{c | c | c | c } A b_0 & A b_1 & \cdots & A b_{n-1} \end{array} \right).
  • If C=(γ0,0γ0,1γ0,n1γ1,0γ1,1γ1,n1γm1,0γm1,1γm1,n1),A=(α0,0α0,1α0,k1α1,0α1,1α1,k1αm1,0αm1,1αm1,k1),andB=(β0,0β0,1β0,n1β1,0β1,1β1,n1βk1,0βk1,1βk1,n1).\begin{array}{c} C = \left( \begin{array}{c c c c } \gamma _{0,0} & \gamma _{0,1} & \cdots & \gamma _{0,n-1} \\ \gamma _{1,0} & \gamma _{1,1} & \cdots & \gamma _{1,n-1} \\ \vdots & \vdots & \vdots & \vdots \\ \gamma _{m-1,0} & \gamma _{m-1,1} & \cdots & \gamma _{m-1,n-1} \\ \end{array} \right) , \quad A = \left( \begin{array}{c c c c } \alpha _{0,0} & \alpha _{0,1} & \cdots & \alpha _{0,k-1} \\ \alpha _{1,0} & \alpha _{1,1} & \cdots & \alpha _{1,k-1} \\ \vdots & \vdots & \vdots & \vdots \\ \alpha _{m-1,0} & \alpha _{m-1,1} & \cdots & \alpha _{m-1,k-1} \\ \end{array} \right), \\ \text{and} \quad B = \left( \begin{array}{c c c c } \beta _{0,0} & \beta _{0,1} & \cdots & \beta _{0,n-1} \\ \beta _{1,0} & \beta _{1,1} & \cdots & \beta _{1,n-1} \\ \vdots & \vdots & \vdots & \vdots \\ \beta _{k-1,0} & \beta _{k-1,1} & \cdots & \beta _{k-1,n-1} \\ \end{array} \right). \end{array}
  • Then C = AB means that γi,j=p=0k1αi,pβp,j\gamma _{i,j} = \sum _{p=0}^{k-1} \alpha _{i,p} \beta _{p,j}

  • A table of matrix-matrix multiplications with matrices of special shape is given at the end of this unit.

3.1. Outer product

  • Let xRmx \in \mathbb{R}^m and yRny \in \mathbb{R}^n. Then the outer product of x and y is given by xyTxy^T. Notice that this yields an m×nm \times n matrix: xyT=(χ0χ1χm1)(ψ0ψ1ψn1)T=(χ0χ1χm1)(ψ0ψ1ψn1)=(χ0ψ0χ0ψ1χ0ψn1χ1ψ0χ1ψ1χ1ψn1χm1ψ0χm1ψ1χm1ψn1).\begin{aligned} xy^T &= \left( \begin{array}{c} \chi _0 \\ \chi _1 \\ \vdots \\ \chi _{m-1} \end{array} \right) \left( \begin{array}{c} \psi _0 \\ \psi _1 \\ \vdots \\ \psi _{n-1} \end{array} \right)^ T = \left( \begin{array}{c} \chi _0 \\ \chi _1 \\ \vdots \\ \chi _{m-1} \end{array} \right) \left( \begin{array}{c c c c} \psi _0 & \psi _1 & \cdots & \psi _{n-1} \end{array} \right) \\ &= \left( \begin{array}{c c c c} \chi _0 \psi _0 & \chi _0 \psi _1 & \cdots & \chi _0 \psi _{n-1} \\ \chi _1 \psi _0 & \chi _1 \psi _1 & \cdots & \chi _1 \psi _{n-1} \\ \vdots & \vdots & & \vdots \\ \chi _{m-1} \psi _0 & \chi _{m-1} \psi _1 & \cdots & \chi _{m-1} \psi _{n-1} \end{array} \right). \end{aligned}
  • The cost of memops of matrix-matrix multiplication is 2kmn2kmn.

3.2. Flops and Memops

results matching ""

    No results matching ""