20) Matrix condition number and SVD#

Last time#

  • Solving Systems

  • Review

Today#

  1. Condition Number of a Matrix

  2. Least squares and normal equations

  3. Intro to SVD

  4. Geometry of the Singular Value Decomposition

using LinearAlgebra
using Plots
default(linewidth=4, legendfontsize=12)

1. Condition Number of a Matrix#

We may have informally referred to a matrix as “ill-conditioned” when the columns are nearly linearly dependent, but let’s make this concept for precise. Recall the definition of (relative) condition number:

\[ \kappa = \max_{\delta x} \frac{|\delta f| / |f|}{|\delta x| / |x|} . \]

We understood this definition for scalar problems, but it also makes sense when the inputs and/or outputs are vectors (or matrices, etc.) and absolute value is replaced by vector (or matrix) norms. Consider matrix-vector multiplication, for which \(f(x) = A x\).

\[ \kappa(A) = \max_{\delta x} \frac{\lVert A (x+\delta x) - A x \rVert/\lVert A x \rVert}{\lVert \delta x\rVert/\lVert x \rVert} = \max_{\delta x} \frac{\lVert A \delta x \rVert}{\lVert \delta x \rVert} \, \frac{\lVert x \rVert}{\lVert A x \rVert} = \lVert A \rVert \frac{\lVert x \rVert}{\lVert A x \rVert} . \]

There are two problems here:

  • I wrote \(\kappa(A)\) but my formula depends on \(x\).

  • What is that \(\lVert A \rVert\) beastie?

Matrix norms induced by vector norms#

Vector norms are built into the linear space (and defined in term of the inner product). Matrix norms are induced by vector norms, according to

\[ \lVert A \rVert = \max_{x \ne 0} \frac{\lVert A x \rVert}{\lVert x \rVert} . \]
  • This equation makes sense for non-square matrices – the vector norms of the input and output spaces may differ.

  • Due to linearity, all that matters is the direction of \(x\), so it could equivalently be written as

\[ \lVert A \rVert = \max_{\lVert x \rVert = 1} \lVert A x \rVert . \]

The formula makes sense, but still depends on \(x\).#

\[\kappa(A) = \lVert A \rVert \frac{\lVert x \rVert}{\lVert Ax \rVert}\]

Consider the matrix

\[\begin{split} A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} . \end{split}\]
  • What is the norm of this matrix?

  • What is the condition number when \(x = [1,0]^T\)?

  • What is the condition number when \(x = [0,1]^T\)?

Condition number of the matrix#

The condition number of matrix-vector multiplication depends on the vector. The condition number of the matrix is the worst case (maximum) of the condition number for any vector, i.e.,

\[ \kappa(A) = \max_{x \ne 0} \lVert A \rVert \frac{\lVert x \rVert}{\lVert A x \rVert} .\]

If \(A\) is invertible, then we can rephrase as

\[ \kappa(A) = \max_{x \ne 0} \lVert A \rVert \frac{\lVert A^{-1} (A x) \rVert}{\lVert A x \rVert} = \max_{A x \ne 0} \lVert A \rVert \frac{\lVert A^{-1} (A x) \rVert}{\lVert A x \rVert} = \lVert A \rVert \lVert A^{-1} \rVert . \]

Evidently multiplying by a matrix is just as ill-conditioned of an operation as solving a linear system using that matrix.

2. Least squares and the normal equations#

A least squares problem takes the form: given an \(m\times n\) matrix \(A\) (\(m \ge n\)), find \(x\) such that

\[ \lVert Ax - b \rVert \]
is minimized. If \(A\) is square and full rank, then this minimizer will satisfy \(A x - b = 0\), but that is not the case in general because \(b\) is not in the range of \(A\). The residual \(A x - b\) must be orthogonal to the range of \(A\).

So if \(b\) is in the range of \(A\), we can solve \(A x = b\). If not, we need only to orthogonally project \(b\) into the range of \(A\).

Solution by QR (Householder triangulation)#

Solves \(R x = Q^T b\).

This method is stable and accurate.

Solution by Cholesky#

The mathematically equivalent form \((A^T A) x = A^T b\) are called the normal equations. The solution process involves factoring the symmetric and positive definite \(n\times n\) matrix \(A^T A\).

The product \(A^T A\) is ill-conditioned: \(\kappa(A^T A) = \kappa(A)^2\) and can reduce the accuracy of a least squares solution.

Solution by Singular Value Decomposition#

Next, we will discuss a factorization

\[ U \Sigma V^T = A \]
where \(U\) and \(V\) have orthonormal columns and \(\Sigma\) is diagonal with nonnegative entries. The entries of \(\Sigma\) are called singular values and this decomposition is the singular value decomposition (SVD).

Singular values of a linear operator (matrix \(A\)) are the square roots of the (necessarily non-negative) eigenvalues of the self-adjoint operator \(A^{*}A\) (where \(A^{*}\) denotes the adjoint of \(A\).)

It may remind you of an eigenvalue decomposition \(X \Lambda X^{-1} = A\), but

  • the SVD exists for all matrices (including non-square and deficient matrices)

  • \(U,V\) have orthogonal columns (while \(X\) can be arbitrarily ill-conditioned).

  • Indeed, if a matrix is symmetric and positive definite (all positive eigenvalues), then \(U=V\) and \(\Sigma = \Lambda\). Computing an SVD requires a somewhat complicated iterative algorithm, but a crude estimate of the cost is \(2 m n^2 + 11 n^3\). Note that this is similar to the cost of \(QR\) when \(m \gg n\), but much more expensive for square matrices.

4. Geometry of the Singular Value Decomposition#

default(aspect_ratio=:equal)

function peanut()
    theta = LinRange(0, 2*pi, 50)
    r = 1 .+ .4*sin.(3*theta) + .6*sin.(2*theta)
    r' .* [cos.(theta) sin.(theta)]'
end

function circle()
    theta = LinRange(0, 2*pi, 50)
    [cos.(theta) sin.(theta)]'
end

function Aplot(A)
    "Plot a transformation from X to Y, first applied to a peanut shape and then to a circle"
    X = peanut()
    Y = A * X
    p = scatter(X[1,:], X[2,:], label="in")
    scatter!(p, Y[1,:], Y[2,:], label="out")
    X = circle()
    Y = A * X
    q = scatter(X[1,:], X[2,:], label="in")
    scatter!(q, Y[1,:], Y[2,:], label="out")
    plot(p, q, layout=2)
end
Aplot (generic function with 1 method)

Multiplication by a scalar: \(\alpha I\)#

Perhaps the simplest transformation is a scalar multiple of the identity.

Aplot(1.1*I)

Diagonal matrices#

It is the same thing if we apply it to any diagonal matrix \(D\)

Aplot([2 0; 0 2]) # In this case D = 2 I

The diagonal entries can be different sizes. Example:

\[\begin{split} A = \begin{bmatrix} 2 & 0 \\ 0 & .5 \end{bmatrix}\end{split}\]
Aplot([2 0; 0 .5])

In this case, the circles becomes an ellipse that is aligned with the coordinate axes (\(a = 2 =a_{11}\) and \(b=0.5=a_{22}\) for this ellipse.)

Givens Rotation (as example of orthogonal matrix)#

We can rotate the input using a \(2\times 2\) matrix, parametrized by \(\theta\). Its transpose rotates in the opposite direction.

function givens(theta)
    s = sin(theta)
    c = cos(theta)
    [c -s; s c]
end

G = givens(0.3)
Aplot(G)

Let’s look at the effect of its transpose:

Aplot(G')

Reflections#

We’ve previously seen that reflectors have the form \(F = I - 2 v v^T\) where \(v\) is a normalized vector. Reflectors satisfy \(F^T F = I\) and \(F = F^T\), thus \(F^2 = I\) (i.e., it is an idempotent matrix).

function reflect(theta)
    v = [cos(theta), sin(theta)]
    I - 2 * v * v'
end

Aplot(reflect(0.3))

Singular Value Decomposition#

The SVD is \(A = U \Sigma V^T\) where \(U\) and \(V\) have orthonormal columns and \(\Sigma\) is diagonal with nonnegative entries. It exists for any matrix (non-square, singular, etc.). If we think of orthogonal matrices as reflections/rotations, this says any matrix can be represented by the sequence of operations: reflect/rotate, diagonally scale, and reflect/rotate again.

Let’s try a random symmetric matrix.

A = randn(2, 2)
A += A' # make symmetric
@show det(A) # Positive means orientation is preserved
Aplot(A)
det(A) = -0.4105648812213614
U, S, V = svd(A) # using the Julia built-in
@show norm(U * diagm(S) * V' - A) # Should be zero
Aplot(V') # Rotate/reflect in preparation for scaling
norm(U * diagm(S) * V' - A) = 4.1563916563713656e-16
  • What visual features indicate that this is a symmetric matrix?

  • Is the orthogonal matrix a reflection or rotation?

    • Does this change when the determinant is positive versus negative (rerun the cell above as needed).

Let’s look at the parts of the SVD#

Aplot(diagm(S)) # scale along axes
Aplot(U) # rotate/reflect back

Putting it together#

Aplot(U * diagm(S) * V') # SVD

Observations#

  • The circle always maps to an ellipse

  • The \(U\) and \(V\) factors may reflect even when \(\det A > 0\)