14) Linear Algebra

14) Linear Algebra#

Today#

Matrices as linear transformations
Polynomial evaluation and fitting
Orthogonality

using Plots
default(linewidth=4, legendfontsize=12)

1. Matrices as linear transformations#

Tip

Resource: An excellent resource for Linear Algebra is the MIT OpenCourseWare course by Prof. Gilbert Strang (now retired). Here is a link to the Fall 2011 undergraduate course edition. Other video galleries of his lectures can also be found.

Linear algebra is the study of linear transformations on vectors, which represent points in a finite dimensional space. The matrix-vector product \(y = A x\) is a linear combination of the columns of \(A\). The familiar definition,

\[ y_i = \sum_j A_{i,j} x_j \]

can also be viewed as

\[\begin{split} y = \Bigg[ A_{:,1} \Bigg| A_{:,2} \Bigg| \dotsm \Bigg] \begin{bmatrix} x_1 \\ x_2 \\ \vdots \end{bmatrix} = \Bigg[ A_{:,1} \Bigg] x_1 + \Bigg[ A_{:,2} \Bigg] x_2 + \dotsb . \end{split}\]

That is, a linear combination of the columns of A.

Math and Julia Notation#

The notation \(A_{i,j}\) corresponds to the Julia syntax A[i,j] and the colon : means the entire range (row or column). So \(A_{:,j}\) is the \(j\)th column and \(A_{i,:}\) is the \(i\)th row. The corresponding Julia syntax is A[:,j] and A[i,:].

Julia has syntax for row vectors, column vectors, and arrays.

[1 2 3] # a row vector

1×3 Matrix{Int64}:
 1  2  3

[1; 2; 3] # a column vector

3-element Vector{Int64}:
 1
 2
 3

[1. 2 3; 4 5 6] # a 2x3 real matrix
# compare with Python's syntax: np.array([[1, 2, 3], [4, 5, 6]])

2×3 Matrix{Float64}:
 1.0  2.0  3.0
 4.0  5.0  6.0

[1 2; 4 3]

2×2 Matrix{Int64}:
 1  2
 4  3

[1 0; 0 2; 10 3]

3×2 Matrix{Int64}:
0
2
3

[1; 2 + 1im; 3]' # ' is transpose, and for complex-valued matrices is the conjugate transpose ("adjoint")

1×3 adjoint(::Vector{Complex{Int64}}) with eltype Complex{Int64}:
 1+0im  2-1im  3+0im

Implementing multiplication by row#

function matmult1(A, x)
    m, n = size(A)
    y = zeros(m)
    for i in 1:m # row index first, i.e., for each row
        for j in 1:n # we iterate over the columns
            y[i] += A[i,j] * x[j] # we apply the familiar definition
        end
    end
    y
end

matmult1 (generic function with 1 method)

A = reshape(1.:12, 3, 4) # a 3x4 matrix with the numbers 1:12

3×4 reshape(::StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, 3, 4) with eltype Float64:
0  4.0  7.0  10.0
0  5.0  8.0  11.0
0  6.0  9.0  12.0

x = [10., 0, 0, 0]

4-element Vector{Float64}:
0
0
0
0

matmult1(A, x)

3-element Vector{Float64}:
0
0
0

# Dot product
A[2, :]' * x

20.0

function matmult2(A, x)
    m, n = size(A)
    y = zeros(m)
    for i in 1:m # iterate over rows of A
        y[i] = A[i,:]' * x # this way we use the dot product between the transposed row of A and the whole x
    end
    y
end

matmult2(A, x)

3-element Vector{Float64}:
0
0
0

Curiosity: which one is faster? Which one takes more allocations?

@time matmult1(A,x)
@time matmult2(A,x)

  0.000008 seconds (1 allocation: 80 bytes)
  0.000010 seconds (4 allocations: 368 bytes)

3-element Vector{Float64}:
0
0
0

Implementing multiplication by column#

function matmult3(A, x)
    m, n = size(A)
    y = zeros(m)
    for j in 1:n # iterate over columns of A
        y += A[:, j] * x[j]
    end
    y
end

matmult3(A, x)

3-element Vector{Float64}:
0
0
0

@time matmult3(A, x)

  0.000007 seconds (13 allocations: 1.016 KiB)

3-element Vector{Float64}:
0
0
0

A * x # built-in matrix-vector multiply

3-element Vector{Float64}:
0
0
0

@time A * x  # We'll use this version

  0.000006 seconds (1 allocation: 80 bytes)

3-element Vector{Float64}:
0
0
0

Check the standard operations documentation page.

2. Polynomial evaluation#

Polynomial evaluation is (continuous) linear algebra#

We can evaluate polynomials using matrix-vector multiplication. For example,

\[\begin{split} - 3x + 5x^3 = \Bigg[ 1 \Bigg|\, x \Bigg|\, x^2 \,\Bigg|\, x^3 \Bigg] \begin{bmatrix}0 \\ -3 \\ 0 \\ 5 \end{bmatrix} . \end{split}\]

using Pkg
Pkg.add("Polynomials")

using Polynomials
P(x) = Polynomial(x)

p = [0, -3, 0, 5] # vector of coefficients for the canonical basis
q = [1, 2, 3, 4]
@show f = P(p)
@show g = P(q)
h = f + g
@show h
@show P(p+q) # we can see that the polynomial evaluation as a linear combination (of matrix columns) is a linear transformation!
x = [0., 1, 2]
h.(x) # can be applied to each element of a vector with the dot operator

   Resolving package versions...

   Installed ConstructionBase ─ v1.6.0

   Installed Polynomials ────── v4.0.19

    Updating `~/.julia/environments/v1.10/Project.toml`
  [f27b6e38] + Polynomials v4.0.19
    Updating `~/.julia/environments/v1.10/Manifest.toml`
  [187b0558] + ConstructionBase v1.6.0
  [f27b6e38] + Polynomials v4.0.19
  [efcf1570] + Setfield v1.1.2
  [1e83bf80] + StaticArraysCore v1.4.3
  [9fa8497b] + Future

Precompiling packages...

    525.0 ms  ✓ ConstructionBase

    389.8 ms  ✓ ConstructionBase → ConstructionBaseLinearAlgebraExt

    633.6 ms  ✓ Unitful → ConstructionBaseUnitfulExt

   1051.6 ms  ✓ Setfield

  11620.1 ms  ✓ Polynomials
  5 dependencies successfully precompiled in 14 seconds. 161 already precompiled.

f = P(p) = Polynomial(-3*x + 5*x^3)
g = P(q) = Polynomial(1 + 2*x + 3*x^2 + 4*x^3)
h = Polynomial(1 - x + 3*x^2 + 9*x^3)
P(p + q) = Polynomial(1 - x + 3*x^2 + 9*x^3)

3-element Vector{Float64}:
0
0
0

plot(h, legend=:bottomright, xlim=(-2, 2))

Polynomial evaluation is (discrete) linear algebra#

V = [one.(x) x x.^2 x.^3] # Vandermonde matrix

3×4 Matrix{Float64}:
0  0.0  0.0  0.0
0  1.0  1.0  1.0
0  2.0  4.0  8.0

V * p + V * q # same as h.(x)

3-element Vector{Float64}:
0
0
0

V * (p + q) # again, matrix multiplication is a linear transformation

3-element Vector{Float64}:
0
0
0

Vandermonde matrices#

A Vandermonde matrix is one whose columns are polynomials (monomials) evaluated at discrete points.

\[V(x) = \begin{bmatrix} 1 \Bigg| x \Bigg| x^2 \Bigg| x^3 \Bigg| \dotsb \end{bmatrix}\]

function vander(x, k=nothing)
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end

vander (generic function with 2 methods)

@show x = LinRange(-1, 1, 50)
V = vander(x, 4)
scatter(x, V, legend=:bottomright, label = ["V_1" "V_2" "V_3" "V_4"])

x = LinRange(-1, 1, 50) = LinRange{Float64}(-1.0, 1.0, 50)

Fitting (polynomial interpolation) is linear algebra#

\[ \underbrace{\begin{bmatrix} 1 \Bigg| x \Bigg| x^2 \Bigg| x^3 \Bigg| \dotsb \end{bmatrix}}_{V(x)} \Big[ p \Big] = \Bigg[ y \Bigg]\]

x1 = [-.9, 0.1, .5, .8]
y1 = [1, 2.4, -.2, 1.3]
scatter(x1, y1, markersize=8)

V = vander(x1)
@show size(V)
p = V \ y1 # write y1 in the polynomial basis; left-division, read V^{-1} * y1 (like solving p V = y1)
scatter(x1, y1, markersize=8, xlims=(-1, 1))
# plot!(P(p), label="P(p)")
plot!(x, vander(x, 4) * p, label="\$ V(x) p\$", linestyle=:dash)

size(V) = (4, 4)

Some common terminology#

The range of \(A\) is the space spanned by its columns. This definition coincides with the range of a function \(f(x)\) when \(f(x) = A x\).
The (right) nullspace of \(A\) is the space of vectors \(x\) such that \(A x = 0\).
The rank of \(A\) is the dimension of its range.
A matrix has full rank if the nullspace of either \(A\) or \(A^T\) is empty (only the 0 vector). Equivalently, if all the columns of \(A\) (or \(A^T\)) are linearly independent.
A nonsingular (or invertible) matrix is a square matrix of full rank. We call the inverse \(A^{-1}\) and it satisfies \(A^{-1} A = A A^{-1} = I\).

\(\DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\null}{null} \)

Poll 15.1: If \(A \in \mathbb{R}^{m\times m}\), which of these doesn’t belong?#

\(A\) has an inverse, \(A^{-1}\)
\(\rank (A) = m\)
\(\null(A) = \{0\}\)
\(A A^T = A^T A\)
\(\det(A) \ne 0\)
\(A x = 0\) implies that \(x = 0\)

Pkg.add("LinearAlgebra")
using LinearAlgebra

A = rand(4, 4)
B = A' * A - A * A' # no. 4! (only valid for orthogonal matrices - more later)
@show B
det(A)

   Resolving package versions...

    Updating `~/.julia/environments/v1.10/Project.toml`
  [37e2e46d] + LinearAlgebra
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`

B = [-1.2795127759162657 -0.07749085075756545 -0.20819398448887425 -0.40557366553292445; -0.07749085075756545 0.5672693835912999 0.08488935186467034 0.12364167865213793; -0.20819398448887425 0.08488935186467034 0.4409665866127473 0.5032238851720175; -0.40557366553292445 0.12364167865213793 0.5032238851720175 0.27127680571221835]

-0.001537083481612192

What is an inverse?#

When we write \(x = A^{-1} y\), we mean that \(x\) is the unique vector such that \(A x = y\). (It is rare that we explicitly compute a matrix \(A^{-1}\), though it’s not as “bad” as people may have told you.) A vector \(y\) is equivalent to \(\sum_i e_i y_i\) where \(e_i\) are columns of the identity. Meanwhile, \(x = A^{-1} y\) means that we are expressing that same vector \(y\) in the basis of the columns of \(A\), i.e., \(\sum_i A_{:,i} x_i\).

using LinearAlgebra
A = rand(4, 4)

4×4 Matrix{Float64}:
397148  0.749308  0.80262   0.425955
89971   0.390124  0.822722  0.978061
11152   0.840346  0.551402  0.257554
977853  0.973235  0.334722  0.989683

A \ A # left-division, read A^{-1} * A; notice I (all 1's on the diagonal, and numerically 0's elsewhere)

4×4 Matrix{Float64}:
  1.0           0.0  -3.97379e-16  0.0
  5.62962e-18   1.0   0.0          0.0
 -6.21488e-18   0.0   1.0          0.0
  8.95841e-17  -0.0   4.38166e-16  1.0

inv(A) * A

4×4 Matrix{Float64}:
  1.0          -9.20967e-16   4.65526e-16  -8.64243e-16
 -1.03605e-16   1.0          -1.39616e-16  -1.82489e-16
  8.39079e-17   5.97904e-17   1.0          -2.37202e-17
  6.92548e-18   7.38486e-16   2.8514e-16    1.0

3. Orthogonality#

The inner product

\[ x^T y = \sum_i x_i y_i \]

of real vectors (or columns of a matrix) tells us about their magnitude and about the angle. The norm is induced by the inner product,

\[ \lVert x \rVert = \sqrt{x^T x} \]

and the angle \(\theta\) is defined by

\[ \cos \theta = \frac{x^T y}{\lVert x \rVert \, \lVert y \rVert} . \]

Inner products are bilinear, which means that they satisfy some convenient algebraic properties

\[\begin{split} \begin{split} (x + y)^T z &= x^T z + y^T z \\ x^T (y + z) &= x^T y + x^T z \\ (\alpha x)^T (\beta y) &= \alpha \beta x^T y \\ \end{split} . \end{split}\]

Examples with inner products#

x = [0, 1]
y = [1, 1]
@show x' * y
@show y' * x;

x' * y = 1
y' * x = 1

ϕ = pi/6
y = [cos(ϕ), sin(ϕ)]
cos_θ = x'*y / (norm(x) * norm(y))
@show cos_θ
@show cos(ϕ-pi/2);

cos_θ = 0.49999999999999994
cos(ϕ - pi / 2) = 0.4999999999999999

Polynomials can be orthogonal too!#

x = LinRange(-1, 1, 50)
A = vander(x, 4)
M = A * [.5 0 0 0; # 0.5
         0  1 0 0;  # x
         0  0 1 0]' # x^2
# that is, M = [0.5 | x | x^2]
scatter(x, M, label = ["M_1" "M_2" "M_3"])
plot!(x, 0*x, label=:none, color=:black)

Which inner product will be zero?
- Which functions are even and odd?

Polynomial inner products#

M[:,1]' * M[:,2]

-2.220446049250313e-16

M[:,1]' * M[:,3]

8.673469387755102

M[:,2]' * M[:,3]

-4.440892098500626e-16