19) Matrix condition number and SVD#

Last time#

  • Solving Systems

  • Review

Today#

  1. Condition Number of a Matrix

  2. Least squares and normal equations

  3. Intro to SVD

  4. Geometry of the Singular Value Decomposition

using LinearAlgebra
using Plots
default(linewidth=4, legendfontsize=12)

1. Condition Number of a Matrix#

We may have informally referred to a matrix as “ill-conditioned” when the columns are nearly linearly dependent, but let’s make this concept for precise. Recall the definition of (relative) condition number:

κ=maxδx|δf|/|f||δx|/|x|.

We understood this definition for scalar problems, but it also makes sense when the inputs and/or outputs are vectors (or matrices, etc.) and absolute value is replaced by vector (or matrix) norms. Consider matrix-vector multiplication, for which f(x)=Ax.

κ(A)=maxδxA(x+δx)Ax/Axδx/x=maxδxAδxδxxAx=AxAx.

There are two problems here:

  • I wrote κ(A) but my formula depends on x.

  • What is that A beastie?

Matrix norms induced by vector norms#

Vector norms are built into the linear space (and defined in term of the inner product). Matrix norms are induced by vector norms, according to

A=maxx0Axx.
  • This equation makes sense for non-square matrices – the vector norms of the input and output spaces may differ.

  • Due to linearity, all that matters is the direction of x, so it could equivalently be written as

A=maxx=1Ax.

The formula makes sense, but still depends on x.#

κ(A)=AxAx

Consider the matrix

A=[1000].
  • What is the norm of this matrix?

  • What is the condition number when x=[1,0]T?

  • What is the condition number when x=[0,1]T?

Condition number of the matrix#

The condition number of matrix-vector multiplication depends on the vector. The condition number of the matrix is the worst case (maximum) of the condition number for any vector, i.e.,

κ(A)=maxx0AxAx.

If A is invertible, then we can rephrase as

κ(A)=maxx0AA1(Ax)Ax=maxAx0AA1(Ax)Ax=AA1.

Evidently multiplying by a matrix is just as ill-conditioned of an operation as solving a linear system using that matrix.

2. Least squares and the normal equations#

A least squares problem takes the form: given an m×n matrix A (mn), find x such that

Axb
is minimized. If A is square and full rank, then this minimizer will satisfy Axb=0, but that is not the case in general because b is not in the range of A. The residual Axb must be orthogonal to the range of A.

So if b is in the range of A, we can solve Ax=b. If not, we need only to orthogonally project b into the range of A.

Solution by QR (Householder triangulation)#

Solves Rx=QTb.

This method is stable and accurate.

Solution by Cholesky#

The mathematically equivalent form (ATA)x=ATb are called the normal equations. The solution process involves factoring the symmetric and positive definite n×n matrix ATA.

The product ATA is ill-conditioned: κ(ATA)=κ(A)2 and can reduce the accuracy of a least squares solution.

Solution by Singular Value Decomposition#

Next, we will discuss a factorization

UΣVT=A
where U and V have orthonormal columns and Σ is diagonal with nonnegative entries. The entries of Σ are called singular values and this decomposition is the singular value decomposition (SVD).

Singular values of a linear operator (matrix A) are the square roots of the (necessarily non-negative) eigenvalues of the self-adjoint operator AA (where A denotes the adjoint of A.)

It may remind you of an eigenvalue decomposition XΛX1=A, but

  • the SVD exists for all matrices (including non-square and deficient matrices)

  • U,V have orthogonal columns (while X can be arbitrarily ill-conditioned).

  • Indeed, if a matrix is symmetric and positive definite (all positive eigenvalues), then U=V and Σ=Λ. Computing an SVD requires a somewhat complicated iterative algorithm, but a crude estimate of the cost is 2mn2+11n3. Note that this is similar to the cost of QR when mn, but much more expensive for square matrices.

4. Geometry of the Singular Value Decomposition#

default(aspect_ratio=:equal)

function peanut()
    theta = LinRange(0, 2*pi, 50)
    r = 1 .+ .4*sin.(3*theta) + .6*sin.(2*theta)
    r' .* [cos.(theta) sin.(theta)]'
end

function circle()
    theta = LinRange(0, 2*pi, 50)
    [cos.(theta) sin.(theta)]'
end

function Aplot(A)
    "Plot a transformation from X to Y, first applied to a peanut shape and then to a circle"
    X = peanut()
    Y = A * X
    p = scatter(X[1,:], X[2,:], label="in")
    scatter!(p, Y[1,:], Y[2,:], label="out")
    X = circle()
    Y = A * X
    q = scatter(X[1,:], X[2,:], label="in")
    scatter!(q, Y[1,:], Y[2,:], label="out")
    plot(p, q, layout=2)
end
Aplot (generic function with 1 method)

Multiplication by a scalar: αI#

Perhaps the simplest transformation is a scalar multiple of the identity.

Aplot(1.1*I)

Diagonal matrices#

It is the same thing if we apply it to any diagonal matrix D

Aplot([2 0; 0 2]) # In this case D = 2 I

The diagonal entries can be different sizes. Example:

A=[200.5]
Aplot([2 0; 0 .5])

In this case, the circles becomes an ellipse that is aligned with the coordinate axes (a=2=a11 and b=0.5=a22 for this ellipse.)

Givens Rotation (as example of orthogonal matrix)#

We can rotate the input using a 2×2 matrix, parametrized by θ. Its transpose rotates in the opposite direction.

function givens(theta)
    s = sin(theta)
    c = cos(theta)
    [c -s; s c]
end

G = givens(0.3)
Aplot(G)

Let’s look at the effect of its transpose:

Aplot(G')

Reflections#

We’ve previously seen that reflectors have the form F=I2vvT where v is a normalized vector. Reflectors satisfy FTF=I and F=FT, thus F2=I (i.e., it is an idempotent matrix).

function reflect(theta)
    v = [cos(theta), sin(theta)]
    I - 2 * v * v'
end

Aplot(reflect(0.3))

Singular Value Decomposition#

The SVD is A=UΣVT where U and V have orthonormal columns and Σ is diagonal with nonnegative entries. It exists for any matrix (non-square, singular, etc.). If we think of orthogonal matrices as reflections/rotations, this says any matrix can be represented by the sequence of operations: reflect/rotate, diagonally scale, and reflect/rotate again.

Let’s try a random symmetric matrix.

A = randn(2, 2)
A += A' # make symmetric
@show det(A) # Positive means orientation is preserved
Aplot(A)
det(A) = -7.387133825214749
U, S, V = svd(A) # using the Julia built-in
@show norm(U * diagm(S) * V' - A) # Should be zero
Aplot(V') # Rotate/reflect in preparation for scaling
norm(U * diagm(S) * V' - A) = 7.691850745534255e-16
  • What visual features indicate that this is a symmetric matrix?

  • Is the orthogonal matrix a reflection or rotation?

    • Does this change when the determinant is positive versus negative (rerun the cell above as needed).

Let’s look at the parts of the SVD#

Aplot(diagm(S)) # scale along axes
Aplot(U) # rotate/reflect back

Putting it together#

Aplot(U * diagm(S) * V') # SVD

Observations#

  • The circle always maps to an ellipse

  • The U and V factors may reflect even when detA>0