Matrix Ym

  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Matrix Ym as PDF for free.

More details

  • Words: 7,683
  • Pages: 17
Matrix Decomposition Ming Yang Electrical and Computer Engineering Northwestern University Evanston, IL 60208 [email protected]

Contents 1. Overview

2

2

Matrix Multiplication and Definitions 2.1 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Special Matrix Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 2 3

3

Singular Value Decomposition 3.1 SVD decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Corollary of SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4 4 5

4

LU and Cholesky Decomposition 4.1 Elementary Operation and Gaussian Transform . . . . . . . . . . . . . . . . . . . . . . 4.2 LU decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Cholesky decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5 5 6 7

5

QR Decomposition 5.1 Householder Reflections and Givens Rotations . 5.2 Gram-Schmidt orthonormalization . . . . . . . 5.3 QR Decomposition . . . . . . . . . . . . . . . 5.4 Least Square Fitting . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

7 . 8 . 9 . 9 . 10

6

Schur Decomposition and Eigenvalue Decomposition 11 6.1 Schur Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6.2 Eigenvalue Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 6.3 Hessenberg Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7

Biconjugate Decomposition 15 7.1 Biconjugate Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1

1. Overview “Matrix decomposition refers to the transformation of a given matrix into a given canonical form.” [1], when the given matrix is transformed to a right-hand-side product of canonical matrices the process of producing this decomposition is also called “matrix factorization”. Matrix decomposition is a fundamental theme in linear algebra and applied statistics which has both scientific and engineering significance. The purposes of matrix decomposition typically involve two aspects: computational convenience and analytic simplicity. In the real world, it is not feasible for most of the matrix computations to be calculated in an optimal explicit way, such as matrix inversion, matrix determinant, solving linear system and least square fitting, thus to convert a difficult matrix computation problem into several easier tasks such as solving triangular or diagonal system will greatly facilitate the calculations. Data matrices representing some numerical observations such as proximity matrix or correlation matrix are often huge and hard to analyze, therefore to decompose the data matrices into some lower-order or lower-rank canonical forms will reveal the inherent characteristic and structure of the matrices and help to interpret their meaning readily. This tutorial is primarily a summary of important matrix decomposition methods, we will first present some basic concepts in Section 2 and then introduce several fundamental matrix decomposition methods in the successive sections, e.g. SVD, LU, QR and Eigen decomposition. A unified view of matrix factorization derived from the Wedderburn rank-one reduction theorem is briefly discussed in the summary Section 7.

2 Matrix Multiplication and Definitions 2.1 Matrix Multiplication Suppose A ∈ Rm×r and B ∈ Rr×n , the matrix multiplication C = AB can be viewed from three different perspectives as follows: Dot Product Matrix Multiply. Every element cij of C is the dot product of row vector aTi and column vector bj . 

 aT1   A =  ...  aTm

ak ∈ R r

B = (b1 , . . . , bn ) bk ∈ Rr C = (cij ) cij = aTi bj . Column Combination Matrix Multiply. Every column cj of C is a linear combination of column vector ak of A with columns bkj as the weight coefficients.

2

A = (a1 , . . . , ar ) ai ∈ Rm B = (b1 , . . . , bn ) bj ∈ Rr C = (c1 , . . . , cn ) cj ∈ Rm r X cj = bkj ak j = 1 : n. k=1

Outer Product Matrix Multiply. C is the sum of r matrices, every matrix is an outer product of A’s column vector and B’s row vector, which is a rank-one matrix. A = (a1 , . . . , ar ) ai ∈ Rm   bT1   B =  ...  bk ∈ Rn bTr r X C= ak bTk . k=1

2.2

Special Matrix Definition

Before further discussion, we first present definitions of some special matrices, here we follow the terms in [2]. Definition 1 A real matrix A is a symmetric matrix if it equals to its own transpose, that is A = AT . Definition 2 A complex matrix A is a hermitian matrix if it equals to its own complex conjugate transpose, that is A = AH . Definition 3 A real matrix Q is an orthogonal matrix if the inverse of Q equals to the transpose of Q, Q−1 = QT , that is QQT = QT Q = I. Definition 4 A complex matrix U is a unitary matrix if the inverse of U equals the complex conjugate transpose of U, U−1 = UH , that is UUH = UH U = I. Definition 5 A matrix A ∈ Rn×n is positive definite if xT Ax ≥ 0 for all nonzero x ∈ Rn . Positive definite matrices have positive definite principle sub-matrices and all the diagonal entries are positive. Definition 6 Suppose S ⊆ Rn be a subspace with orthonormal basis V = (v1 , . . . , vk ), P = VT V ∈ Rn×n is the orthogonal projection matrix onto S such that range(P) = S, P2 = P, and PT = P. P is unique for subspace S. Hermitian matrix and unitary matrix are the counterparts of symmetric and orthogonal matrix in R, the following theorems in R can be readily transformed to the corresponding forms in C by substituting the transpose by conjugate transpose and orthogonal matrix by unitary matrix. Therefore, for simplicity, we present most of the matrix decomposition results in R. 3

3

Singular Value Decomposition

Suppose matrix A ∈ Rm×n , the column vectors of A, namely range(A), represent a subspace in Rm , similarly range(AT ) is a subspace in Rn , apparently the two subspaces have the same dimension equals to the rank of A. SVD decomposition is able to reveal the orthonormal basis of the range(A) and range(AT ) and the respective scale factors σi simultaneously.

3.1 SVD decomposition Theorem 1 Singular Value Decomposition(SVD) If matrix A ∈ Rm×n , then there exist orthogonal matrices U = (u1 , . . . , um ) ∈ Rm×m , V = (v1 , . . . , vn ) ∈ Rn×n and diagonal matrix Σ = diag(σ1 , . . . , σp ) ∈ Rm×n p = min(m, n), such that A = UΣVT ,

σ1 ≥ σ2 . . . ≥ σp ≥ 0.

where

Proof 1 Let σ1 = kAk2 = maxkvk2 =1 kAvk2 . Then there exist unit 2-norm vectors u1 ∈ Rm and v1 ∈ Rn , such that Av1 kAv1 k = σ1 , u1 = , therefore Av1 = σ1 u1 . σ1 Any orthonormal set can be extended to form an orthonormal basis for the whole space, so we can find V1 ∈ Rn×(n−1) and U1 ∈ Rm×(m−1) , such that V = (v1 V1 ) ∈ Rn×n and U = (u1 U1 ) ∈ Rnm×m are orthonormal basis, thus . A1 =

µ

¡ Let σ1

uT1 UT 1



µ (Av1 AV1 ) =

uT1 AV1

¢ T

uT1 Av1 uT1 AV1 UT1 Av1 UT1 AU1



µ =

σ1 ku1 k22 uT1 AV1 σ1 UT1 u1 UT1 AU1



µ =

σ1 uT1 AV1 0 UT1 AU1

¢ ω T ∈ Rn , the 2-norm of the product with A1 gives: µ ¶ σ2 + ωT ω 2 σ1 kA1 k22 = k 1 k2 ≥ (σ12 + ω T ω)2 ω ...

¡ = σ1

So the 2-norm of matrix A1 is kA1 k2 = sup

x∈Rn

kA1 xk (σ 2 + ω T ω) ≥ p 12 = kxk (σ1 + ω T ω)

q (σ12 + ω T ω),

while U and V are both orthonomal basis and kA1 k2 = kAk2 = σ1 , so ω = 0. An induction on arguments completes the proof. The σi are the singular values of A and the vector ui and vi are the left singular vector and right singular vector, which satisfy that Avi = σi ui

and

4

AT ui = σi vi .



3.2 Corollary of SVD SVD decomposition reveals many intrinsic properties of matrix A and is numerical stable for calculations. Corollary 1 If A = UΣVT is a SVD of A with σ1 ≥ σ2 . . . ≥ σr ≥ σr+1 = . . . = σp = 0, we have the following statements: 1. rank(A) = r. 2. null(A) = span{vr+1 , . . . , vn }. 3. range(A) = span{u1 , . . . , ur }. P 4. A = rj=1 σj uj vjT = Ur Σr Vr , where Ur = (u1 , . . . , ur ),Vr = (v1 , . . . , vr ),Σr = (σ1 , . . . , σr ). q 5. kAkF = σ12 + . . . + σp2 . 6. kAk2 = σ1 . p 7. σj = λj (AT A), j = 1, . . . , p, where λj (AT A) is the jth largest eigenvalue of AT A. 8. vi are orthonormalized eigenvectors of AT A and ui are orthonormalized eigenvectors of AAT . SVD is generalized to simultaneously diagonalize two matrices [3] or decomposition of a matrix that employs different metrics in the normalizations [4].

4 LU and Cholesky Decomposition Solution to the linear system equation Ax = b is the basic problem in linear algebra. Theoretically when A is a non-singular square matrix there exists a unique solution x = A−1 b, however the inverse of a matrix is typically not easy to compute. So we hope to transform A to some triangular systems which are much easier to solve by forward or backward substitution, this process is referred to as Gaussian elimination [5]. This process can be summarized in matrix form as LU decompostion and a series of evolutions when matrix A has extra properties.

4.1 Elementary Operation and Gaussian Transform For square matrix A, the following three operations are referred to as elementary row (column) operations of type 1, 2, and 3 respectively: 1. interchanging two rows (or columns) in A. 2. multiplying all elements of a row (or column) of A by some nonzero number. 3. adding to any row (or column) of A any other row (or column) of A multiplied by a non zero number. 5

These operations can be implemented by pre- or post-multiplying an appropriate matrices called elementary matrices, the type 3 row elementary matrices have the following forms:     1 0 1 0     .. .. . .             1 ... τ 1     . . (3) (3)     .. 1 E = 1 ..  or E =       1 τ . . . 1         . . . .     . . 0 1 0 1 Gaussian elimination process can be described as matrix multiplications of type 3 lower triangle elementary matrices. For x ∈ Rn with xk 6= 0, Gaussian Transformation is defined as matrix Mk = I−τ eTk , where Gauss vector τ is 



xi τ T = 0, . . . , 0, τk+1 , . . . , τn  τi = i = k + 1 : n and eTk = | {z } xk

Ã

!

0, . . . , 0, |{z} 1 , 0, . . . , 0 kth

k

Pre-multiply x with Mk then the last k + 1 to n elements of x are zeroed.   1 ··· 0 0 ··· 0  x   .. .. . . ..   1    .. . . . . .  ...   . . .     1 0 0  xk   0 = Mk x =     −τk+1 1 0   0  xk+1    . .  . . .  .. . . .. .. . . . ..   . . .   xn 0 . . . −τ 0 ... 1 n

x1 ... xk 0 ... 0

       

It is easy to verify that Gaussian transform matrix is the product of lower triangular type 3 elementary matrices with det(Mk ) = 1. So by multiplying a series of Gaussian transform matrix, the lower part of A can be gradually zeroed given that the pivots xkk 6= 0 during the process. This process can be summarized as LU decomposition.

4.2 LU decomposition Theorem 2 LU Decomposition Let A ∈ Rn×n and all the leading principal minors det(A(1 : k, 1 : k)) 6= 0, k = 1, . . . , n − 1. Then there exist a unique unit lower triangular L with diagonal elements all equal to one and a unique upper triangular matrix U such that A = LU, and det(A) = u11 u22 . . . unn . Proof 2 Given a11 6= 0 in A, we can find Gaussian transform M1 to zero the a21 , . . . , an1 . Suppose at k − 1 step Mk−1 . . . M1 A = A(k−1) , consider the k × k portion of this equation, since Gaussian trans(k−1) (k−1) forms are unit lower triangular with determinants equal to one, det(A(1 : k, 1 : k) = a11 . . . akk 6= (k−1) 0. Therefore the kth pivot akk 6= 0, we can proceed to find Gaussian transform Mk . 6

−1 If A = L1 U1 = L2 U2 are two LU decompositions of a non-singular A, then L−1 2 L1 = U2 U1 , since the left part of the equation is unit lower triangular while the right side is upper triangular, both of the matrices must be the identity to satisfy the equation. Hence, L1 = L2 and U1 = U2 . If A = LU then det(A) = det(LU) = det(L) det(U) = u11 u22 . . . unn .

For linear system Ax = b if we pre-compute the LU decomposition of A = LU, the problem reduces to solve two triangle systems Ly = b and Ux = y which can be calculated much more readily. Moreover when the system has to be solved with respect to many different b, such as the solution of certain circuit under different excitations, the LU decomposition method is very efficient.

4.3 Cholesky decomposition If the matrix A has additional properties, the LU decomposition will have particular forms. In this section we will present the specilized LU decomposition for symmetric and positive definite matrices. First we express the LU decomposition in an equavalent way. Theorem 3 LDMT Decomposition. Let A ∈ Rn×n and all the leading principal minors det(A(1 : k, 1 : k)) 6= 0, k = 1, . . . , n − 1. Then there exist unique unit lower triangular matrices L and M and a unique diagonal matrix D = diag(d1 , . . . , dn ), such that A = LDMT . If A has a LU decomposition A = LU and let D = diag(u11 , . . . , unn ), observe that MT = D−1 U is unit upper triangular. Thus A = LU = LD(D−1 U) = LDMT . Uniqueness follows from the uniqueness of LU decomposition. Further if A is symmetric, A = LDMT = AT = MDLT , MDLT is also the LDMT Decomposition of A. From the uniqueness we have L = M. Theorem 4 LDLT Decomposition. If A ∈ Rn×n is a non-singular symmetric matrix and has LDMT Decomposition A = LDMT , then L = M and A = LDLT . For a positive definite matrix A, the D = diag(d1 , . . . , dn ) in LDMT decomposition has positive diagonal entries. So we can further specilize the LU decomposition for symmetric positive definite matrices. Theorem 5 Cholesky Decomposition. If A ∈ Rn×n is symmetric positive definite, then there exists a unique lower triangular G ∈ Rn×n with positive diagonal entries such that A = GGT , and G is referred to the Cholesky triangle. In the LDLT decomposition of symmetric A, √ the entries of the diagonal matrix D = diag(d1 , . . . , dn ) √ are all positive, so let G = Ldiag( d1 , . . . , dn ) is a lower triangular with positive diagonal entries and A = GGT , the uniqueness follows from the uniqueness of the LDLT decomposition.

5

QR Decomposition

If the linear system Ax = b is overdetermined, namely, where A ∈ Rm×n with m ≥ n and b ∈ Rm , the exact solution may not exist. So we can use the least square solution of the minimization kAx − bk2 as a substitution. In this section we will present several methods to construct the QR decomposition and how to compute the least square fitting by QR, LU and SVD decomposition. 7

5.1 Householder Reflections and Givens Rotations Let v ∈ Rn be nonzero, a n-by-n matrix P of the form P = I − 2vv T /v T v, is called a Householder reflection or Householder matrix or Householder transformation. The vector v is called a Householder vector. A geometric illustration: when a vector x is multiplied by P, it is reflected with the hyperplane of v’s orthogonal complement span{v}⊥ . Householder reflections can be used to zero selected entries of a vector in similar way to Gauss transformations, while Gauss transformations are unit lower triangular and Householder matrices are orthogonal and symmetric: PPT = PP = (I − 2vv T /v T v)(I − 2vv T /v T v) = I − 4vv T /v T v + 4vv T vv T /(v T v)2 = I. Given a vector 0 6= x ∈ Rn , we will show there exists a Householder reflection can zero the all but the first elements in x, such that Px ∈ span{e1 }. Let v = x + αe1 and α = kxk2 , Px = x − = x− = x−

2v T x v vT v 2(xT x+αx1 ) (x + αe1 ) xT x+2αx1 +α2 2(α2 +αx1 ) (x + αe1 ) = αe1 2α2 +2αx1

Furthermore, to zero all the elements except the first two entries of vector x = (x1 , x2 , . . . , xn )T , we can obtain the Household vector v 0 = x0 + kx0 ke1 where x0 = (x2 , . . . , xn )T and extend v = (0, v 0 )T . So on so forth we can apply a series of Householder reflections to reduce matrix A to a upper triangular matrix. Household reflections are capable of introducing zeros to all but the first element of a vector, Givens rotations are able to selectively zero one element. For vector x = (x1 , . . . , xi , . . . , xk , . . . , xn )T , xk 6= 0, the following Givens rotation can force xk to be zero:   1 ... 0 ... 0 ... 0 . .. ..   .. . . . ..  . . .     0 ... c ... s ... 0   . .. . . .. ..  . G(i, k, θ) =  . . . . .     0 . . . −s . . . c . . . 0     .  . . . . .. .. . . ..   .. 0 ... 0 ... 0 ... 1 where c = cos(θ) = √ x2i

xi +x2k

k and s = sin(θ) = √−x . In a geometric view, Givens rotation amounts to 2 2

xi +xk

a counterclockwise rotation θ in (i, k) coordinate plane. It is easy to check G(i, k, θ) is also orthogonal, and by the pre-multiplication of a series of Givens rotations we can zero the lower part of a matrix.

8

5.2 Gram-Schmidt orthonormalization If A = (a1 , . . . , an ) ∈ Rm×n is a linear independent set of vectors, by subtracting from the the projections of ak onto ai (i < k) from ak and adequate normalization, we can gradually orthonormalize A to an orthonomal set Q = (q1 , . . . , qm ) as follows: q1 q2 q3 .. . qk

= a1 /r11 = (a2 − r21 q1 )/r22 = (a3 − r31 q1 − r32 q2 )/r33 . = .. P = (ak − k−1 i=1 rik qi )/rkk )

r11 = ka1 k r21 = xT2 q1 , r22 = ka2 − r21 q1 k2 r31 = xT3 q1 , r32 = xT3 q2 , r33 = ka3 − r31 q1 − r32 q2 k2 .. . P rik = qiT ak , rkk = kak − k−1 i=1 rik qi k2

P Therefore, ak = ki=1 rik qi , A is the product of Q and an upper triangular R = (rij ), this process is called the Gram-Schmidt orthonormalization process. This process is sensitive to roundoff errors. A modified version of Gram-Schmidt process subtracts the projections onto qk of all the succeeding ai from ai instead of subtract from ai all the previous qk . When qk is determined we first subtract the projection of ai onto qk from ai for i > k and then normalize the new ak+1 to get qk+1 : for k=1 to n qk = akk /rkk , rkk = kak k2 k+1 k ai = ai − rki qk , rki = akT i qk , for i = k + 1 : n After Q is calculated, by sequentially substituting akk with the previous aik , i < k we can easily get the representation a1k namely ak with respect to q1 , . . . , qk : 1 qk = akk /rkk = rkk (ak−1 − rk−1 k qk−1 ) k k−2 1 = rkk (ak − rk−2 k qk−2 − rk−1 k qk−1 ) . = .. P 1 (a1k − k−1 = rkk i=1 rik qi ).

Thus ak =

Pk

i=1 rik qi ,

which implies A is the product of Q and an upper triangular R = (rij ).

5.3 QR Decomposition Theorem 6 QR Decomposition. Let A ∈ Rm×n , there exist an orthogonal matrix Q ∈ Rm×m and an upper triangular matrix R ∈ Rm×n , such that A = QR All the methods in the previous sub-sections can be viewed as different constructive proofs of QR decomposition, including Householder reflection, Givens rotation and Gram-Schimdt orthogonalization process and its modification version.

9

Corollary 2 If A ∈ Rm×n has full column rank m ≥ n and A = QR is a QR decomposition. A = (a1 , . . . , an ) and Q = (q1 , . . . , qm ) are column partition forms, then span{a1 , . . . , ak } = span{q1 , . . . , qk } k = 1 : n range(A) = span{q1 , . . . , qn } . ⊥ range(A) = span{qn+1 , . . . , qm } Let Q1 = (q1 , . . . , qn ), A = Q1 R1 with R1 ∈ Rn×n , then G = RT1 is the lower triangular Cholesky factor of AT A. The first part of the corollary can be easily proved by the Gram-Schmidt process. AT A = (Q1 R1 )T Q1 R1 = RT1 R1 = GT G, so R1 is unique upper triangular with positive diagonal entries.

5.4

Least Square Fitting

Let’s consider the overdetermined system Ax = b, where the data matrix A ∈ Rm×n and the observation vector b ∈ Rm with m ≥ n, typically the system has no exact solution if b is not an element of range(A). The goal of least square fitting problem is to find x ∈ Rn to minimize J = kAx − bkp , where p = 2 the optimization function is analytic. If A doesn’t have full column rank the solution to the LS fitting is not unique, if x minimizes the J then x + z, z ∈ null(A) is also a solution. Assume A has full column rank the unique LS solution xLS is give by the pseudo inverse xLS = A† b = (AT A)−1 AT b. As we mentioned before the inverse of a matrix is typically not easy to compute, the normal equation AT AxLS = AT b is more practical to be solved by QR decomposition, LU decomposition or SVD decomposition. Note that the 2-norm of a vector is invariant under orthogonal transformation. Suppose A = QR is the QR decomposition, we can get µ ¶ R1 n T Q A=R= , 0 m−n where R1 is square upper triangular, and

µ T

Q b=

c d



n , m−n

then kAx − bk22 = kQT Ax − QT bk22 = kR1 x − ck22 + kdk22 . Thus xLS ∈ Rn can be readily solved with back substitution of the upper triangular system R1 x = c. Once the QR decomposition of A is computed by Householder reflection or any other methods, the full rank LS problem can be solved by the above procedure. An alternative method is to solve AT AxLS = AT b by LU decomposition of AT A. C = AT A is a symmetric positive definite matrix, there exists the Cholesky decomposition C = GGT , so solving the two triangular system gives the LS solution: d = AT b, Gy = d, GT xLS = y. If A doesn’t have full column rank, AT A may not be invertible. we can use SVD decomposition of A to solve the LS problem. 10

Theorem 7 Let A ∈ Rm×n and A = UΣVT is its SVD decomposition with rank(A) = r. If U = (u1 , . . . , um ) and V = (v1 , . . . , un ) are column partitions and b ∈ Rm , then the LS solution to Ax = b is: r X uTi b vi xLS = σi i=1 Proof 3

kAx − bk22 = P k(UT AV)(VT x) − UTP bk22 = kΣα − UT bk22 r m T 2 T 2 = i=1 (σi αi − ui b) + i=r+1 (ui b) ¡ ¢T where α = VT x. Clearly, only the first part related to x, so α = uTi b/σi , . . . , uTr b/σr , 0, . . . , 0 minimizes the fitting, thus xLS = Vα =

r X uT b i

i=1

σi

vi .

In addition given the SVD decomposition the pseduo inverse of A ∈ Rm×n is defined as A† ∈ Rn×m and A† = VΣ† UT , where ¶ µ 1 1 † , . . . , , 0, . . . , 0 ∈ Rn×m . Σ = diag σ1 σr

6

Schur Decomposition and Eigenvalue Decomposition

Given a square matrix A, we have interests about what is the simplest form B in C or R under unitary(orthogonal) similarity transform A = QBQH or similarity transform A = XBX−1 . Matrix B reveals the intrinsic information of A in that many attributes and structure of matrices are invariant under similarity transform. Definition 7 Let A ∈ Cn×n , if there exists a non-zero vector x ∈ Cn that satisfies Ax = λx, λ ∈ C, λ is called the eigenvalue of matrix A and x is referred to as eigenvector. Eigenvalues are the n roots of matrix A’s characteristic polynomial det(λI − A), the set of eigenvalues is also called the spectrum of A. The sum of the diagonal elements of A is referred to as trace of A, trace(A) =

n X i=1

aii =

n X

λi .

i=1

6.1 Schur Decomposition Theorem 8 Schur Decomposition. Let A ∈ Cn×n , then there exists a unitary Q ∈ Cn×n such that QH AQ = T = D + N

11

where D = diag(λ1 , . . . , λn ) and N is strictly upper triangular. Q = (q1 , . . . , qn ) is a column partitioning of the unitary matrix Q where qi is referred to as Schur vectors and from AQ = QT Schur vector satisfy k−1 X Aqk = λk qk + nik qi , k = 1 : n. i=1

Proof 4 The theorem obviously holds when n = 1. Suppose λ is an eigenvalue of matrix A and Ax = λx with x ∈ Cn is a unit vector. Then x can be extended to a unitary matrix U = (x, u2 , . . . , un ), AU = (Ax, Au2 , . . . , Aun ) = (λx, µ Au2 , .T. .¶, Aun ) λ ω = U 0 C ˜ such that U ˜ H CU ˜ is upper Suppose the theorem holds for matrices of order n − 1, there is a unitary U ˜ triangular. Thus, lets Q = Udiag(1, U) it is easy to verify the theorem holds for order n. For a real matrix A, the eigenvalues are either real or conjugate complex in pairs. In order to operate all with real numbers, T changes to block upper triangular with either 1-by1 or 2-by-2 diagonal blocks which is called as real Schur decomposition. Theorem 9 Real Schur Decomposition.Let A ∈ Rn×n , then there exists an orthogonal Q ∈ Rn×n such that   R11 R12 . . . R1m  0 R22 . . . R2m    T Q AQ = R =  .. .. ..  . .  . . . .  0 0 . . . Rmm where each Rii is either a 1-by-1 matrix a 2-by-2 matrix having complex conjugate eigenvalues. Proof 5 The theorem obviously holds for n = 1. Let A ∈ Rn×n , if A has a real eigenvalue λ then A can be block diagonalized and reduced to order n − 1 as shown in the proof of Schur decomposition. If A has a couple of conjugate complex eigenvalue λ1,2 = α ± iβ, it is easily to see the corresponding eigenvectors are also complex conjugate x1,2 = y ± iz, where y and z are real vectors. ¶ µ ¡ ¢ ¡ ¢ α β A(y + iz) = (α + iβ)(y + iz) ⇒ A y z = y z . −β α β 6= 0 implies that y and z are independent, thus by Gram-Schimt process we can extend y and z to an orthogonal Q = (y, (y − r12 z)/r22 , q3 , . . . , qn ), such that µ ¶ R11 R12 T Q AQ = 0 R22 where R11 is a 2-by-2 matrix with eigenvalues λ1,2 = α + iβ. By induction the theorem holds.

12

Corollary 3 A is normal, namely AH A = AAH , if and only if there exists a unitary Q ∈ Cn×n such that QH AQ = diag(λ1 , . . . , λn ). Corollary 4 A is real symmetric matrix, there exists an orthogonal Q ∈ Rn×n such that QT AQ = diag(λ1 , . . . , λn ). Consider the real Schur decomposition of symmetric A, so R is also symmetric. And the eigenvalues of 2-by-2 symmetric matrices are real, thus A can be diagonalized.

6.2 Eigenvalue Decomposition Theorem 10 Block Diagonal Decomposition. Let A ∈ Cn×n and a Schur decomposition as follows:   T11 T12 . . . T1q  0 T22 . . . T2q    QH AQ = T =  .. .. ..  ...  . . .  0 0 . . . Tqq assume that the Tii are square and the eigenvalues of Tii and Tjj are different whenever i 6= j, then there exists a nonsingular matrix Y ∈ Cn×n , such that 0

0

(Y−1 QH )A(QY) = diag(T11 , . . . , Tqq ). For matrix A ∈ Cn×n , the order of eigenvalue λi in the characteristic polynomial is referred to as algebraic multiplicity of λi , the dimensions of null(λi I − A) is called geometric multiplicity of λi which implies the number of independent eigenvectors associated with λi . Corollary 5 Diagonal Decomposition. Let A ∈ Cn×n , there exists a non-singular X ∈ Cn×n which can diagonalize A X−1 AX = diag(λ1 , . . . , λn ), if and only if the geometric multiplicities of all eigenvalue λi equal to their algebraic multiplicities. Theorem 11 Jordan Decomposition. Let A ∈ Cn×n , then there exists a non-singular X ∈ Cn×n such that X−1 AX = diag(J1 , . . . , Jt ),where   λi 1 ... 0   .  0 λi . . ...    .. .. ..  Ji =  . . .    .  .. ..  .. . . 1  0 ... 0 λi is mi − by − mi square matrix and m1 + . . . + mt =n, Ji is referred to as Jordan blocks.

13

6.3 Hessenberg Decomposition Theorem 12 Hessenberg Decomposition. Let A ∈ Rn×n , then there exists an orthogonal matrix Q ∈ Rn×n , such that QT AQ = H where H is a Hessenberg matrix which means the elements below the sub-diagonal are zero. Proof 6 We claim Q is a product of n − 2 Householder matrices P1 , . . . , Pn2 . We can find n − 1 order Householder reflection P1 to zero the first column of A except the first two entries. Let α = (a21 , . . . , an1 )T and P1 α = (a21 , 0, . . . , 0)T . Let P1 = diag(1, P), note Householder matrices are symmetric and P1 is symmetric, then µ ¶µ ¶µ ¶ µ ¶ 1 0 1 0 a11 ω a11 ω T P1 T P1 AP = = α A22 0 P1 0 P1 P1 α P1 A22 P1 Now suppose the k − 1 step has been done we find k − 1 Householder matrices P1 , . . . , Pk−1 such that   B11 B12 B13 (P1 . . . Pk−1 )T A (P1 . . . Pk−1 ) =  B11 b22 B23  0 B32 B33 is upper Hessenberg through its first k−1 columns. B32 is a vector with n−k elements, we can find n−k order Householder matrix Pk to zero B32 ’s elements except the first entry, Let Pk = diag(In−k , Pk ), then   B11 B12 BT13 Pk (P1 . . . Pk )T A (P1 . . . Pk ) =  B11 b22 BT23 Pk  0 Pk B32 Pk B33 Pk is upper Hessenberg through its first k columns. By induction, the theorem holds. If matrix A is symmetric, the Hessenberg decomposition leads to a tri-diagonal form of A. This claim can be easily verified by setting ω = αT andB23 = BT32 in the above proof.   h11 h12 0 ... 0 ..   .   h21 h22 . . 0 .   T . . . .. .. .. Q AQ = H =  0  .  0   . ... ...  .. 0 hn−1 n  0 . . . 0 hn n−1 hnn Companion matrix decomposition is a non-orthogonal(non-unitary in complex domain) analog of the Hessenberg decompositon, just like the relation of Schur decomposition and Jordan decomposition. Companion matrix indicates the matrices have the following forms and their transpose, which can be easily derived from the characteristic polynomial det(λI − C) = c0 + c1 λ + . . . + cn−1 λn−1 + λn :     0 0 . . . 0 −c0 −cn−1 . . . −c2 −c1 −c0  1 0 . . . 0 −c1   1 0 ... 0 0       0 1 . . . 0 −c2   0  1 . . . 0 0 C=  C= .  .. .. .. ..   .. .. .. .. .. ..   . . . .   . . . . . .  0 0 . . . 1 −cn−1 0 0 ... 1 0 14

Schur Decomposition is an important means to compute eigenvalues. A practical iteration scheme based on Hessenberg decomposition and QR decomposition is called QR iteration as follows: Hessenberg decomposition H0 = UT0 AU0 for k = 1, 2, . . . QR decomposition Hk−1 = Uk Rk Hk = Rk Uk The QR iteration converges to the Schur decomposition of matrix A. Please refer to [3] for details.

7

Biconjugate Decomposition

7.1 Biconjugate Decomposition A variety of matrix decomposition processes can be unified with the Wedderburn rank-one reduction theorem [6], such as Gram-Schmidt orthogonalization process, LU, QR, SVD decomposition. Theorem 13 If A ∈ Rm×n , x ∈ Rn and y ∈ Rm are vectors such that ω = y T Ax 6= 0, then the matrix . B = A − ω −1 Axy T A has rank exactly one less than the rank of A. Proof 7 We will show the order of B’s null space is one larger than that of A. ∀z ∈ null(A),e.g. Az = 0 we get Bz = 0, so null(A) ⊆ null(B). ∀z ∈ null(B), 0 = Bz = Az − ω −1 Ax(y T Az). Let k = ω −1 y T Az, which is a scalar, thus A(z − kx) = 0, (z − kx) ∈ null(A), note Ax 6= 0, the null space of B is therefore obtained from that of A by adding x to its basis, which increase the order of this space by 1. Thus, the rank of B is one less then A. Suppose rank(A) = r, we can define a rank reducing process to generate a sequence of Wedderburn matrices {Ak } by using . . A1 = A, Ak+1 = Ak − ωk−1 Ak xk ykT Ak for any vector xk ∈ Rn and yk ∈ Rm satisfying ωk = ykT Ak xk 6= 0. The sequence will terminate in r steps since {rank(Ak )} decreases by exactly one at each step. This process can be summarized in matrix outer-product factorization form: A = ΦΩ−1 ΨT . . . where Ω = diag{ω1 , . . . , ωr }, Φ = (φ1 , . . . , φr ) ∈ Rm×r and Ψ = (ψ1 , . . . , ψr ) ∈ Rn×r with . . φk = Ak xk , ψk = ATk yk 15

(1)

Further equ. 1 can be written: ¡ ¢ A = (A1 x1 , . . . , Ar xr ) Ω−1 y1T A1 , . . . , yrT Ar

(2)

Note every Ak can be expressed with A, we can find U = (u1 , . . . , ur ) ∈ Rn×r and V = (v1 , . . . , vr ) ∈ Rm×r , where Auk = Ak xk and vkT A = ykT Ak . k−1

X . uk = x k − i=1

µ

viT Axk viT Aui



k−1

X . ui , v k = y k − i=1

µ

ykT Aui viT Aui

¶ vi

Thus equ. 1 can be rewritten as VT AU = Ω A = AUΩ−1 VT A.

(3) (4)

This matrix decomposition process in equ. 1,3,4 is referred to as biconjugate decomposition in [6], which can be easily verified by substitution Wedderburn matrix Ar+1 = 0 with {Ak }.(U, V) is called A-biconjugate pair and (X, Y) is called A-biconjugatable. Depending on the initial matrix A and the choice of the vector sets (X, Y), a variety of factorizations can be derived from biconjugate decomposition. Here we list the results for some well-known matrix decompositions, please refer to [6] for details. Gram-Schmidt let A be the identity matrix and (X, Y) are identical and contain the vectors for which an orthogonal basis is desired, (U = V) give the resultant orthogonal basis. LDM For A ∈ Rn×n of rank n, if the A-biconjugatable (X, Y) are both the identity matrix (I, I), then equ. 3 provides the unique LDMT decomposition of A, where A = V−T ΩU−1 for V−T and U−T unit lower triangular matrices. QR For A ∈ Rn×n of rank n, if the A-biconjugatable (X, Y) is (I, A) and gives the the biconjugate −1 2 pair (U, V) = (R−1 1 , QΨ) and Ω = Ψ in equ. 3, where Ψ is a diagonal matrix and R1 is the unit upper triangular matrix, R = ΨR1 and Q give the QR decomposition A = QR. SVD For A ∈ Rm×n of rank, the SVD of A given as A = UΣVT , the A-biconjugatable (X, Y) is (V, U).

7.2

Summary

 SVD : A = UΣVH     QR : A = QR      (square)Schur : A = QTQH (non-unitary)Jordan : A = XJX−1 (· · · )Eigen : A = XΣX−1 m×n (real square)Real Schur : A = QRQT (symmetric)Eigen : A = XΣXT A∈C   (real square)Hessenberg : A = QHQT (non-orthogonal)Companion(symmetric)Tri-Diagonal     (non-singular square · · · )LU(LDM) : A = LU = LDMT (symmetric)LDL : A = LDLT    (symmetric positive definite square)Cholesky : A = GGT 16

References [1] E. W. Weisstein. Matrix decomposition. MathWorld–A Wolfram Web Resource. [Online]. Available: http://mathworld.wolfram.com/MatrixDecomposition.html [2] R. Bronson, Matrix Methods An Introduction, 2nd ed.

Academic Press, 1991.

[3] G. H.Golub and C. F. Loan, Matrix Computations, 2nd ed.

Johns Hopkins Press, 1996.

[4] L. Hubert, J. Meulman, and W. Heiser, “Two purposes for matrix factorization: A historical appraisal,” SIAM Review, vol. 42, no. 1, pp. 68–82, 2000. [5] P. Lancaster and M. Tismenestsky, The Theory of Matrices, 2nd ed., W. Rheinboldt, Ed. 1985.

Academic Press,

[6] M. T. Chu, R. E. Funderlic, and G. H. Golub, “A rank-one reduction formula and its applications to matrix factorizations,” SIAM Review, vol. 37, no. 4, pp. 512–530, 1995.

17

Related Documents

Matrix Ym
October 2019 27
Ym
November 2019 28
Ym
May 2020 17
Ym Overview
May 2020 16
Matrix
November 2019 36
Matrix
May 2020 19