43) Extra: Sparse Linear Algebra#
Sparse direct solvers
1.1 Matrix orderings
1.2 Impact on formulation
1.3 Cost scalingIterative solvers
using Plots
default(linewidth=3)
using LinearAlgebra
using SparseArrays
function my_spy(A)
cmax = norm(vec(A), Inf)
s = max(1, ceil(120 / size(A, 1)))
spy(A, marker=(:square, s), c=:diverging_rainbow_bgymr_45_85_c67_n256, clims=(-cmax, cmax))
end
function laplacian_matrix(n)
h = 2 / n
rows = Vector{Int64}()
cols = Vector{Int64}()
vals = Vector{Float64}()
wrap(i) = (i + n - 1) % n + 1
idx(i, j) = (wrap(i)-1)*n + wrap(j)
stencil_diffuse = [-1, -1, 4, -1, -1] / h^2
for i in 1:n
for j in 1:n
append!(rows, repeat([idx(i,j)], 5))
append!(cols, [idx(i-1,j), idx(i,j-1), idx(i,j), idx(i+1,j), idx(i,j+1)])
append!(vals, stencil_diffuse)
end
end
sparse(rows, cols, vals)
end
laplacian_matrix (generic function with 1 method)
1. Sparse Direct Solvers#
Example: Advection-diffusion operator in 2D#
Eliminate Dirichlet boundary conditions around all sides
function advdiff_matrix(n; kappa=1, wind=[0, 0])
h = 2 / (n + 1)
rows = Vector{Int64}()
cols = Vector{Int64}()
vals = Vector{Float64}()
idx((i, j),) = (i-1)*n + j
in_domain((i, j),) = 1 <= i <= n && 1 <= j <= n
stencil_advect = [-wind[1], -wind[2], 0, wind[1], wind[2]] / h
stencil_diffuse = [-1, -1, 4, -1, -1] * kappa / h^2
stencil = stencil_advect + stencil_diffuse
for i in 1:n
for j in 1:n
neighbors = [(i-1, j), (i, j-1), (i, j), (i+1, j), (i, j+1)]
mask = in_domain.(neighbors)
append!(rows, idx.(repeat([(i,j)], 5))[mask])
append!(cols, idx.(neighbors)[mask])
append!(vals, stencil[mask])
end
end
sparse(rows, cols, vals)
end
advdiff_matrix (generic function with 1 method)
A = advdiff_matrix(5, wind=[0, 0])
@show norm(A - A')
my_spy(A)
norm(A - A') = 0.0
Gaussian elimination and Cholesky#
Performing Gaussian Elimination, we obtain the following decomposition:
Given a \(2\times 2\) block matrix, the algorithm proceeds as \begin{split}
\end{split} where \(L_A U_A = A\) and
Cholesky#
With Cholesky factorization, we obtain the following decomposition:
Which means, that for a \(2\times 2\) block matrix, we have:
\begin{split}
\end{split} where \(L_A L_A^T = A\) and
A = advdiff_matrix(10)
N = size(A, 1)
ch = cholesky(A, perm=1:N)
my_spy(A)
my_spy(sparse(ch.L))
Cost of a banded solve#
Consider an \(N\times N\) matrix with bandwidth \(b\), \(1 \le b \le N\).
Work one row at a time
Each row/column of panel has \(b\) nonzeros
Schur complement update affects \(b\times b\) sub-matrix
Total compute cost \(N b^2\)
Storage cost \(N b\)
Question#
What bandwidth \(b\) is needed for an \(N = n\times n \times n\) cube in 3 dimensions?
What is the memory cost?
What is the compute cost?
my_spy(sparse(ch.L))
1.1 Matrix orderings#
n = 20
A = advdiff_matrix(n)
heatmap(reshape(1:n^2, n, n))
using Pkg
Pkg.add("Metis")
import Metis
perm, iperm = Metis.permutation(A)
heatmap(reshape(iperm, n, n))
Resolving package versions...
No Changes to `~/.julia/environments/v1.10/Project.toml`
No Changes to `~/.julia/environments/v1.10/Manifest.toml`
cholesky(A, perm=1:n^2)
SparseArrays.CHOLMOD.Factor{Float64, Int64}
type: LLt
method: simplicial
maxnnz: 8019
nnz: 8019
success: true
cholesky(A, perm=Vector{Int64}(perm))
SparseArrays.CHOLMOD.Factor{Float64, Int64}
type: LLt
method: simplicial
maxnnz: 4031
nnz: 4031
success: true
1.2 Impact on formulation#
Cholesky factors in nested dissection#
n = 10
A = advdiff_matrix(n)
perm, iperm = Metis.permutation(A)
my_spy(A[perm, perm])
ch = cholesky(A, perm=Vector{Int64}(perm))
my_spy(sparse(ch.L))
The dense blocks in factor \(L\) are “supernodes”
They correspond to “vertex separators” in the ordering
1.3 Cost scaling#
Cost in nested dissection#
Cost is dominated by dense factorization of the largest supernode
Its size comes from the vertex separator size
Example: 2D square#
\(N = n^2\) dofs
Vertex separator of size \(n\)
Compute cost \(v^3 = N^{3/2}\)
Storage cost \(N \log N\)
Example: 3D Cube#
\(N = n^3\) dofs
Vertex separator of size \(v = n^2\)
Compute cost \(v^3 = n^6 = N^2\)
Storage cost \(v^2 = n^4 = N^{4/3}\)
Questions#
How much does the cost change if we switch from Dirichlet to periodic boundary conditions in 2D?
How much does the cost change if we move from 5-point stencil (\(O(h^2)\) accuracy) to 9-point “star” stencil (\(O(h^4)\) accuracy)?
Would you rather solve a 3D problem on a \(10\times 10\times 10000\) grid or \(100\times 100 \times 100\)?
Let’s test our intuition
n = 80
A_dirichlet = advdiff_matrix(n)
perm, iperm = Metis.permutation(A_dirichlet)
cholesky(A_dirichlet, perm=Vector{Int64}(perm))
#cholesky(A_dirichlet)
SparseArrays.CHOLMOD.Factor{Float64, Int64}
type: LLt
method: supernodal
maxnnz: 0
nnz: 136289
success: true
A_periodic = laplacian_matrix(n) + 1e-10*I
perm, iperm = Metis.permutation(A_periodic)
cholesky(A_periodic, perm=Vector{Int64}(perm))
# cholesky(A_periodic) # try this
SparseArrays.CHOLMOD.Factor{Float64, Int64}
type: LLt
method: supernodal
maxnnz: 0
nnz: 180479
success: true
How expensive and how fast?#
Suppose we have a second order accurate method in 3D.
n = 2. .^ (2:13)
N = n.^3
error = (50 ./ n) .^ 2
seconds = 1e-10 * N.^2
hours = seconds / 3600
cloud_dollars = 3 * hours
kW_hours = 0.2 * hours
barrel_of_oil = kW_hours / 1700
kg_CO2 = kW_hours * 0.709
;
cost = hours
plot(cost, error, xlabel="cost", ylabel="error", xscale=:log10, yscale=:log10, label = "error")
Remember: There is always an environmental cost#
Outlook on sparse direct solvers#
Sparse direct works well for 2D and almost-2D problems to medium large sizes
High order FD methods make sparse direct cry
High order finite element are okay, but not high-continuity splines
Almost-2D includes a lot of industrial solid mechanics applications
The body of a car, the frame of an airplane
Sparse direct is rarely usable in “fully 3D” problems
“thick” structures
soil mechanics, hydrology, building foundations, bones, tires
fluid mechanics
aerodynamics, heating/cooling systems, atmosphere/ocean
Setup cost (factorization) is much more expensive than solve
Amortize cost in time-dependent problems
Rosenbrock methods: factorization reused across stages
“lag” Jacobian in Newton (results in “modified Newton”)
“lag” preconditioner with matrix-free iterative methods (Sundials, PETSc)
Factorization pays off if you have many right hand sides
2. Iterative solvers#
Less reliable, more leaky abstraction
More sensitive to problem formulation
Slower for small problems
Several different strategies, each with tuning knobs
Accuracy tolerances needed
\(O(N)\) solvers exist for many important problems#
High-order discretization can be okay
Need multi-level representations (hierarchy of approximations)
This can be interpreted very generally