15) Gram-Schmidt and QR

15) Gram-Schmidt and QR#

Last time#

Linear Algebra
Polynomial evaluation
Orthogonality

Today#

Revisit orthogonality
Constructing orthogonal bases
2.1 Orthogonal matrices
2.2 Gram-Schmidt orthogonalization
QR factorization

using LinearAlgebra
using Plots
using Polynomials
default(linewidth=4, legendfontsize=12)

function vander(x, k=nothing)
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end

vander (generic function with 2 methods)

1. Revisit orthogonality#

Polynomials can be orthogonal too!#

x = LinRange(-1, 1, 50)
A = vander(x, 4)
M = A * [.5 0 0 0; # 0.5
         0  1 0 0;  # x
         0  0 1 0]' # x^2
# that is, M = [0.5 | x | x^2]
scatter(x, M, label = ["M_1" "M_2" "M_3"])
plot!(x, 0*x, label=:none, color=:black)

Which inner product will be zero?
- Which functions are even and odd?

Pairwise inner product#

The pairwise inner products between two sets of vectors can be expressed by collecting the sets as columns in matrices and writing \(A = X^T Y\) where \(A_{i,j} = x_i^T y_j\). It follows from this definition that

\[ (X^T Y)^T = Y^T X .\]

M' * M

3×3 Matrix{Float64}:
 12.5          -1.11022e-15   8.67347
 -1.11022e-15  17.3469       -2.22045e-16
  8.67347      -2.22045e-16  10.8272

Normalization and orthogonalization#

q1 = M[:,1]
q1 /= norm(q1) # normalize q1
Q = [q1 M[:, 2:end]]
scatter(x, Q, label = ["M_1/|M_1|" "M_2" "M_3"])
plot!(x, 0*x, label=:none, color=:black)

Q' * Q # now the first entry of the Q' * Q matrix, after normalizing the first column vector of M, is 1

3×3 Matrix{Float64}:
  1.0          -1.94289e-16   2.45323
 -1.94289e-16  17.3469       -2.22045e-16
  2.45323      -2.22045e-16  10.8272

2. Orthogonality#

2.1 Orthogonal matrices#

If two vectors \(x\) and \(y\) are such that \(x^T y = 0\) then we say \(x\) and \(y\) are orthogonal (or “\(x\) is orthogonal to \(y\)”).

A vector is said to be normalized if \(\lVert x \rVert = 1\). If \(x\) is orthogonal to \(y\) and \(\lVert x \rVert = \lVert y \rVert = 1\) then we say \(x\) and \(y\) are orthonormal.

A square matrix with orthonormal columns is said to be an orthogonal matrix (or orthonormal matrix).

We typically use \(Q\) or \(U\) and \(V\) for matrices that are known/constructed to be orthogonal.

Orthogonal matrices are always full rank – the columns are linearly independent.

The inverse of an orthogonal matrix is its transpose:

\[ Q^T Q = Q Q^T = I . \]

Orthogonal matrices are a powerful building block for robust numerical algorithms.

2.2 Gram-Schmidt orthogonalization#

For many applications, we find ourselves interested in the column spaces of a matrix \(A\):

\[ \langle a_1 \rangle \subseteq \langle a_1, a_2 \rangle \subseteq \langle a_1, a_2, a_3 \rangle \ldots \]

The idea of QR factorization is the construction of a sequence of orthonormal vectors, \(q_1, q_2, \ldots\) that span these successive spaces.

Thus, suppose we want to find an orthogonal basis for the span of the columns of \(A\):

\[\begin{split} \Bigg[ a_1 \Bigg| a_2 \Bigg] = \Bigg[ q_1 \Bigg| q_2 \Bigg] \begin{bmatrix} r_{11} & r_{12} \\ 0 & r_{22} \end{bmatrix} \end{split}\]

Given \(a_1, a_2, \dots\), we can construct vectors \(q_1, q_2, \ldots\) and entries \(r_{ij}\), by an iterative process of successive orthogonalization.

A naive algorithm#

function gram_schmidt_naive(A)
    m, n = size(A)
    Q = zeros(m, n)
    R = zeros(n, n)
    for j in 1:n
        v = A[:,j]
        for k in 1:j-1
            r = Q[:,k]' * v
            v -= Q[:,k] * r
            R[k,j] = r
        end
        R[j,j] = norm(v)
        Q[:,j] = v / R[j,j]
    end
    Q, R
end

gram_schmidt_naive (generic function with 1 method)

A = vander(x, 4)
Q, R = gram_schmidt_naive(A)
@show norm(Q' * Q - I) # Q is indeed orthogonal
@show norm(Q * R - A); # A is factored as the product of Q and R

norm(Q' * Q - I) = 6.609184312080184e-16
norm(Q * R - A) = 

2.500206551829073e-16

2.500206551829073e-16

What do orthogonal polynomials look like?#

Computations with orthogonal polynomials form the basis of spectral methods, one of the most powerful techniques for the numerical solution of partial differential equations (PDE).

x = LinRange(-1, 1, 50)
A = vander(x, 6)
Q, R = gram_schmidt_naive(A)
plot(x, Q, marker=:auto, legend=:none)

What happens if we use more than 50 values of \(x\)?

Theorem:#

Every full-rank \(m\times n\) matrix (\(m \ge n\)) has a unique reduced \(Q R\) factorization with \(R_{j,j} > 0\)#

The algorithm we’re using generates this positive matrix due to the line:

        R[j,j] = norm(v)

Solving equations using \(QR = A\)#

If \(A x = b\) then \(Rx = Q^T b\) (because Q is orthogonal, and it is easy to solve with \(R\) since is triangular).

x1 = [-0.9, 0.1, 0.5, 0.8] # points where we know values
y1 = [1, 2.4, -0.2, 1.3]
scatter(x1, y1)
A = vander(x1, 3)
Q, R = gram_schmidt_naive(A)
p1 = R \ (Q' * y1) # our computed solution: R^{-1} Q^T y1
p2 = A \ y1 # Julia A^{-1} y1
plot!(x, vander(x, 3) * p1, label = "\$ V(x) p1\$")
plot!(x, vander(x, 3) * p2, label = "\$ V(x) p2\$", linestyle = :dash)

How accurate is it?#

m = 20
x = LinRange(-1, 1, m) # let's include more columns in the Vandermonde matrix
A = vander(x, m)
Q, R = gram_schmidt_naive(A)
@show norm(Q' * Q - I) # we are losing orthogonality; unstable algorithm
@show norm(Q * R - A)

norm(Q' * Q - I) = 1.073721107832196e-8
norm(Q * R - A) = 8.268821431611631e-16

8.268821431611631e-16

A variant with more parallelism#

We are performing the following (from right to left):

(3)#\[\begin{align} w = (I - q_2 q_2^T) (I - q_1 q_1^T) v &= (I - q_1 q_1^T - q_2 q_2^T + q_2 q_2^T q_1 q_1^T) v \\ &= \Bigg( I - \Big[ q_1 \Big| q_2 \Big] \begin{bmatrix} q_1^T \\ q_2^T \end{bmatrix} \Bigg) v \end{align}\]

which can be factored in blocks to expoloit a bit of parallelism. Let’s call this classical Gram-Schmidt.

function gram_schmidt_classical(A)
    m, n = size(A)
    Q = zeros(m, n)
    R = zeros(n, n)
    for j in 1:n
        v = A[:,j]
        R[1:j-1,j] = Q[:,1:j-1]' * v
        v -= Q[:,1:j-1] * R[1:j-1,j]
        R[j,j] = norm(v)
        Q[:,j] = v / norm(v)
    end
    Q, R
end

gram_schmidt_classical (generic function with 1 method)

m = 20
x = LinRange(-1, 1, m)
A = vander(x, m)
Q, R = gram_schmidt_classical(A)
@show norm(Q' * Q - I) # really not orthogonal; unstable algorithm
@show norm(Q * R - A)

norm(Q' * Q - I) = 1.4985231287367549
norm(Q * R - A) = 7.350692433565389e-16

7.350692433565389e-16

Classical Vs Modified Gram-Schmidt#

Why does the order of operations matter?#

We are performing the following (from right to left):

(4)#\[\begin{align} w = (I - q_2 q_2^T) (I - q_1 q_1^T) v &= (I - q_1 q_1^T - q_2 q_2^T + q_2 q_2^T q_1 q_1^T) v \\ &= \Bigg( I - \Big[ q_1 \Big| q_2 \Big] \begin{bmatrix} q_1^T \\ q_2^T \end{bmatrix} \Bigg) v \end{align}\]

which can be factored in blocks and is not exact in finite arithmetic.

Let’s look at this a bit more closely. In the classical Gram-Schmidt (CGS), we take each vector, one at a time, and make it orthogonal to all previous vectors. If an error is made in computing \(q_2\) in CGS, so that \(q^T_1q_2=\delta\) is small, but non-zero in finite arithmetic, this will not be corrected for in any of the computations that follow, and it will actually be propagated. In this case, \(v_3\) will not be orthogonal to \(q_1\) or \(q_2\).

Because of rounding errors, in classical Gram-Schmidt, \(Q_{k−1}\) does not have truly orthogonal columns. In modified Gram–Schmidt (MGS) we compute the length of the projection of \(w = v_k\) onto \(q_1\) and subtract that projection (and the rounding errors) from \(w\). Next, we compute the length of the projection of the computed \(w\) onto \(q_2\) and subtract that projection (and the rounding errors) from \(w\), and so on, but always orthogonalizing against the computed version of \(w\).

We can look at the size of what’s left over#

We project out the components of our vectors in the directions of each \(q_j\).

x = LinRange(-1, 1, 23)
A = vander(x)
Q, R = gram_schmidt_classical(A)
scatter(diag(R), yscale=:log10)

The next vector is almost linearly dependent#

x = LinRange(-1, 1, 20)
A = vander(x)
Q, _ = gram_schmidt_classical(A)
#Q, _ = qr(A) # try it with Julia's built-in
v = A[:,end]
@show norm(v)
scatter(abs.(Q[:,1:end-1]' * v), yscale=:log10)

norm(v) = 1.4245900685395503

Right-looking modified Gram-Schmidt#

Each outer step of the modified Gram-Schmidt algorithm can be interpreted as a right-multiplication by a square upper-triangular matrix.

function gram_schmidt_modified(A)
    m, n = size(A)
    Q = copy(A)
    R = zeros(n, n)
    for j in 1:n
        R[j,j] = norm(Q[:,j])
        Q[:,j] /= R[j,j]
        R[j,j+1:end] = Q[:,j]'*Q[:,j+1:end]
        Q[:,j+1:end] -= Q[:,j]*R[j,j+1:end]'
    end
    Q, R
end

gram_schmidt_modified (generic function with 1 method)

m = 20
x = LinRange(-1, 1, m)
A = vander(x, m)
Q, R = gram_schmidt_modified(A)
@show norm(Q' * Q - I) # better, in terms of orthogonality error
@show norm(Q * R - A)

norm(Q' * Q - I) = 8.486718528276085e-9
norm(Q * R - A) = 8.709998074379606e-16

8.709998074379606e-16

The order of operations matter!

Classical versus modified?#

Classical
- Really unstable, orthogonality error of size \(1 \gg \epsilon_{\text{machine}}\)
- Don’t need to know all the vectors in advance
Modified
- Needs to be right-looking for efficiency
- Less unstable, but orthogonality error \(10^{-9} \gg \epsilon_{\text{machine}}\)

m = 10
x = LinRange(-1, 1, m)
A = vander(x, m)
Q, R = qr(A) # the Julia built-in
@show norm(Q' * Q - I) # not so bad at all..

norm(Q' * Q - I) = 2.2760345477368898e-15

2.2760345477368898e-15