Solving eigenvalue problems numerically

Numerical Analysis

Urbain Vaes

NYU Paris

Objective

Goal of this chapter

Study numerical methods to find the eigenpairs \((\mathbf{\boldsymbol{v}}_i, \mathbf{\boldsymbol{\lambda}}_i)\in \mathbf C^n \times \mathbf C\) satisfying \[ \mathsf A \mathbf{\boldsymbol{v}}_i= \lambda_i \mathbf{\boldsymbol{v}}_i, \] where \(\mathsf A \in \mathbf C^{n \times n}\).

\(~\)

  • Simplifying assumption throughout this chapter: the matrix \(\mathsf A\) is diagonalizable.
  • Notation: eigenvalues \(|\lambda_1| \geqslant\dotsc \geqslant|\lambda_n|\), normalized eigenvectors \(\mathbf{\boldsymbol{v}}_1, \dotsc, \mathbf{\boldsymbol{v}}_n\). \[ \mathsf A \mathsf V = \mathsf V \mathsf D, \qquad \mathsf V = \begin{pmatrix} \mathbf{\boldsymbol{v}}_1 & \dots & \mathbf{\boldsymbol{v}}_n \end{pmatrix}, \qquad \mathsf D = {\rm diag}(\lambda_1, \dotsc, \lambda_n). \]
  • Finding all the eigenvalues of a matrix is expensive, even with the best methods:

    using LinearAlgebra
    A = rand(2000, 2000)
    @time eigen(A)
    6.694395 seconds

    \({\color{green} \rightarrow}\) We first focus on methods for finding just one or a few eigenpairs.

Program of today

  • Simple vector iterations

    calculate just one eigenpair

  • Simultaneous iteration

    calculate multiple eigenpairs at the same time

  • Application: PageRank

    find the invariant distribution of a Markov chain

Remark: can we calculate the roots of the characteristic polynomial?

Cost of calculating the characteristic polynomial scales as \({\color{red}n!}\)

\({\color{green} \rightarrow}\) not a viable approach…

The power iteration

Power iteration

The power iteration is a method for finding \(\color{blue}(\lambda_1, v_1)\): \[ \fcolorbox{blue}{lightgreen}{$ \mathbf{\boldsymbol{x}}_{k+1} = \frac{\mathsf A \mathbf{\boldsymbol{x}}_{k}}{{\lVert {\mathsf A \mathbf{\boldsymbol{x}}_{k}} \rVert}} $} \] Given \(\mathbf{\boldsymbol{x}}_k \approx \mathbf{\boldsymbol{v}}_1\), the eigenvalue is calculated from the Rayleigh quotient: \[ \fcolorbox{blue}{lightgreen}{$ \lambda_{1} \approx \frac{\mathbf{\boldsymbol{x}}_{k}^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k} $} \]

  • Normalization is necessary to avoid numerical Inf.
A = [2 1; 1 2]
x = rand(2)
for i in 1:20
    x = A*x
    x /= (x'x)
end
println("Eigenvector: $x, eigenvalue: $(x'A*x)")
Eigenvector: [0.7071067812838212, 0.7071067810892738], eigenvalue: 3.0

Convergence of the power iteration

Notation: let \(\angle\) denote the acute angle between \(\mathbf{\boldsymbol{x}}\) and \(\mathbf{\boldsymbol{y}}\): \[ \angle(\mathbf{\boldsymbol{x}}, \mathbf{\boldsymbol{y}}) = \arccos\left( \frac{{\lvert {\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{y}}} \rvert}}{\sqrt{\mathbf{\boldsymbol{x}}^* \mathbf{\boldsymbol{x}}} \sqrt{\mathbf{\boldsymbol{y}}^* \mathbf{\boldsymbol{y}}}}\right). \]

Since the eigenvectors of \(\mathsf A\) span \(\mathbf C^n\), any vector \(\mathbf{\boldsymbol{x}}_0\) may be decomposed as \[ \mathbf{\boldsymbol{x}}_0 = \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \alpha_n \mathbf{\boldsymbol{v}}_n. \]

Proposition: convergence of the power iteration started from \(\mathbf{\boldsymbol{x}}_0\)

Suppose that \({\lvert {\lambda_1} \rvert} > {\lvert {\lambda_2} \rvert}\) and \(\alpha_1 \neq 0\). Then \((\mathbf{\boldsymbol{x}}_k)_{k \geqslant 0}\) satisfies \[ \lim_{k \to \infty} \angle(\mathbf{\boldsymbol{x}}_k, \mathbf{\boldsymbol{v}}_1) = 0. \] The sequence \((\mathbf{\boldsymbol{x}}_k)\) is said to converge essentially to \(\mathbf{\boldsymbol{v}}_1\). \(~\)

Proof

By construction \[ \mathbf{\boldsymbol{x}}_k = \frac{\lambda_1^k \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \lambda_n^k \alpha_n \mathbf{\boldsymbol{v}}_n}{{\lVert {\lambda_1^k \alpha_1 \mathbf{\boldsymbol{v}}_1 + \dotsb + \lambda_n^k \alpha_n \mathbf{\boldsymbol{v}}_n} \rVert}} = \frac{\lambda_1^k \alpha_1}{|\lambda_1^k \alpha_1|} \frac{\mathbf{\boldsymbol{v}}_1 + \frac{\lambda_2^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_2 + \dotsb + \frac{\lambda_n^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_n }{\left\lVert\mathbf{\boldsymbol{v}}_1 + \frac{\lambda_2^k \alpha_2}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_2 + \dotsb + \frac{\lambda_n^k \alpha_n}{\lambda_1^k \alpha_1} \mathbf{\boldsymbol{v}}_n\right\rVert}. \] Therefore \[ \mathbf{\boldsymbol{x}}_k \left(\frac{|\lambda_1^k \alpha_1|}{\lambda_1^k \alpha_1}\right) \xrightarrow[k \to \infty]{} \mathbf{\boldsymbol{v}}_1. \] (Notice that the bracketed factor on the left-hand side is a scalar of modulus 1.)

The dominant error term scales as \(\color{red} |\lambda_2/\lambda_1|^k\).

Inverse and Rayleigh quotient iteration

Suppose that \(\mu \notin \sigma(\mathsf A)\) and \(|\lambda_J - \mu| < |\lambda_K - \mu| \leqslant|\lambda_j - \mu|\) for all \(j \neq J\).

Inverse iteration: to find \(\lambda_J\), the eigenvalue that is nearest \(\mu\)

Let \(\mathsf B = (\mathsf A - \mu \mathsf I)^{-1}\). The inverse iteration around \(\mu \in \mathbf C\) is given by \[ \fcolorbox{blue}{lightgreen}{$ \mathbf{\boldsymbol{x}}_{k+1} = \frac{\mathsf B \mathbf{\boldsymbol{x}}_{k}}{{\lVert {\mathsf B \mathbf{\boldsymbol{x}}_{k}} \rVert}}. $} \] Given \(\mathbf{\boldsymbol{x}}_k \approx \mathbf{\boldsymbol{v}}_J\), the eigenvalue is calculated from the Rayleigh quotient: \[ \fcolorbox{blue}{lightgreen}{$ \lambda_{J} \approx \frac{\mathbf{\boldsymbol{x}}_{k}^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k} $} \]

  • The dominant eigenvalue of \(\mathsf B\) is given by \((\lambda_J - \mu)^{-1}\).

  • The associated eigenvector is \(\mathbf{\boldsymbol{v}}_J\).

  • Use \ operator instead of calculating the inverse:

    A = [2 1; 1 2]
    x = rand(2)
    μ = 0
    B = A - μ*I
    for i in 1:10
        x = B\x
        x /= (x'x)
    end
    println("Eigenvector: $x, eigenvalue: $(x'A*x)")
    Eigenvector: [-0.7070650980198242, 0.7071484618962388], eigenvalue: 1.0000000069495358

Can we adapt \(\mu\) during the iteration to accelerate convergence?

  • Error of the inverse iteration scales as \(\color{red} \left|\frac{\lambda_J - \mu} {\lambda_K - \mu}\right|^k\).

    \({\color{green} \rightarrow}\) suggests letting \[ \fcolorbox{blue}{lightgreen}{$ \mu_k = \frac{\mathbf{\boldsymbol{x}}_k^* \mathsf A \mathbf{\boldsymbol{x}}_k}{\mathbf{\boldsymbol{x}}_k^* \mathbf{\boldsymbol{x}}_k}. $} \]

    This leads to the Rayleigh quotient iteration.

  • If convergence occurs, the associated eigenvalue converges cubically:

    \[ |\lambda^{(k+1)} - \lambda_J| \leqslant C |\lambda^{(k)} - \lambda_J|^3. \]

    \({\color{green} \rightarrow}\) Number of significant digits is roughly tripled at each iteration.

A = [2 1; 1 2]
setprecision(500)
x = BigFloat.([1.2; 0.9])
μ = 0
for i in 1:6
    x = (A - μ*I)\x
    x /= (x'x)
    μ = x'A*x
    println(μ)
end
2.689655172413793214338566074331922284628319727818488081860106227371262853189736760002654726539617400119969038165484095334893321873413803868251681795864
2.9876835222760986149331492272467355588185696109799282648595173771965397075426195063225678761434065353075142377089820027363840168362403721691362285304077
2.9999995241744513496497052218657690803207923292634754813048708145726507175594474194389978335452546331521437999768268695077710762632721314300139734193988
2.9999999999999999999730670707803381925912042248826047017208189162200256216460941091027939474791840614585910098581290345505474157692691833334644060848503
2.9999999999999999999999999999999999999999999999999999999999951158299301653110614230820165226784762374074665384291286744975203198864866424226623297764997
3.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000012