25) Interpolation#

Last time:#

  • Introduction to Fortran

Today:#

  1. Interpolation using polynomials

  2. Lagrange Interpolating Polynomials

  3. Newton polynomials

  4. Choice of points and choice of bases

using LinearAlgebra
using Plots
using LaTeXStrings
default(linewidth=4, legendfontsize=12)

function vander(x, k=nothing)
    if isnothing(k)
        k = length(x)
    end
    m = length(x)
    V = ones(m, k)
    for j in 2:k
        V[:, j] = V[:, j-1] .* x
    end
    V
end

function peanut()
    theta = LinRange(0, 2*pi, 50)
    r = 1 .+ .4*sin.(3*theta) + .6*sin.(2*theta)
    r' .* [cos.(theta) sin.(theta)]'
end

function circle()
    theta = LinRange(0, 2*pi, 50)
    [cos.(theta) sin.(theta)]'
end

function Aplot(A)
    "Plot a transformation from X to Y"
    X = peanut()
    Y = A * X
    p = scatter(X[1,:], X[2,:], label="in")
    scatter!(p, Y[1,:], Y[2,:], label="out")
    X = circle()
    Y = A * X
    q = scatter(X[1,:], X[2,:], label="in")
    scatter!(q, Y[1,:], Y[2,:], label="out")
    plot(p, q, layout=2, aspect_ratio=:equal)
end
Aplot (generic function with 1 method)

1. Interpolation using polynomials#

What is interpolation?#

Given data \((x_i, y_i)\), find a (smooth?) function \(f(x)\) such that \(f(x_i) = y_i\).

Data in#

  • direct field observations/measurement of a physical or social system

  • numerically processed observations, perhaps by applying physical principles

  • output from an expensive “exact” numerical computation

  • output from an approximate numerical computation

Functions to use as interpolants#

  • Polynomials

  • Piecewise polynomials (includes nearest-neighbor)

  • Powers and exponentials

  • Trigonometric functions (sine and cosine)

  • Neural networks

Interpolation fits the data exactly!

Polynomial interpolation#

Recall that in the Linear Algebra lecture, we’ve seen how we can fit a polynomial using Vandermonde matrices, one column per basis function and one row per observation.

\[\begin{split} \begin{bmatrix} 1 & x_0 & x_0^2 & x_0^3 & \ldots \\ 1 & x_1 & x_1^2 & x_1^3 & \ldots \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ 1 & x_n & x_n^2 & x_n^3 & \ldots \end{bmatrix}_{A \in \mathbb R^{(n+1) \times m}} \begin{bmatrix} p_0 \\ p_1 \\ \vdots \\ p_n \end{bmatrix} = \begin{bmatrix} y_0 \\ y_1 \\ \vdots \\ y_n \end{bmatrix} \end{split}\]

which is the matrix form of the system of equations:

(9)#\[\begin{align} p(x_0) &= a_0 + a_1 x_0 + a_2 x_0^2 + \ldots + a_n x_0^m = y_0\\ p(x_1) &= a_0 + a_1 x_1 + a_2 x_1^2 + \ldots + a_n x_1^m = y_1\\ p(x_2) &= a_0 + a_1 x_2 + a_2 x_2^2 + \ldots + a_n x_2^m = y_2\\ \vdots & \\ p(x_n) &= a_0 + a_1 x_n + a_2 x_n^2 + \ldots + a_n x_n^m = y_n\\ \end{align}\]

Poll 25.1: It’s possible to find a unique interpolating polynomial \(\mathbf p \in P_n\) when which of the following are true?#

  1. \(m \le n+1\)

  2. \(m = n+1\)

  3. \(m \ge n+1\)

Hint: Fondamental Theorem of Algebra.

Polynomial interpolation with a Vandermonde matrix#

x = LinRange(-1.5, 2, 4)
y = sin.(x)
A = vander(x)
p = A \ y # p = A^{-1} y

scatter(x, y)
s = LinRange(-3, 3, 50)
plot!(s, [sin.(s) vander(s, length(p)) * p])

Vandermonde matrices can be ill-conditioned#

A = vander(LinRange(-1, 1, 20))
cond(A)
2.7224082312417406e8

Reflection:

  1. It’s because of the points \(x\)?

  2. It’s because of the basis functions \(\{ 1, x, x^2, x^3, \dotsc \}\)?

2. Lagrange Interpolating Polynomials#

Suppose we are given function values \(y_0, \dotsc, y_m\) at the distinct points \(x_0, \dotsc, x_m\) and we would like to build a polynomial of degree \(m\) that goes through all these points. This explicit construction is attributed to Lagrange (though it is said that he was not first):

\[ l_i(x)= \frac{(x-x_0)(x-x_1) \cdots (x-x_{i-1})(x-x_{i+1}) \cdots (x-x_m)}{(x_i-x_0)(x_i-x_1) \cdots (x_i-x_{i-1})(x_i-x_{i+1}) \cdots (x_i-x_m)} \qquad , \; i=0, 1, \ldots, \, m \ . \]

By construction, this is a degree \(m\) polynomial, with \(l_i(x_j) = \delta_{ij}\) (where \(\delta_{ij}\) denotes the Kronecker delta). This just says that \(l_i(x)\) has zeros at all the \(x_j\)’s except \(x_i\), where it is identically equal to \(1\). Hence, we notice that the Lagrange basis for polynomials of degree \(≤ m\) for those nodes is the set of polynomials \(\{l_0(x), l_1(x), \ldots, l_m(x) \}\) which take values \(l_j(x_k) = 0\) if \(k \neq j\) and \(l_j(x_j)=1\).

Lagrange’s interpolating formula

\[\begin{split} p(x) = \sum_{i=0}^m y_i l_i(x) \equiv \sum_{i=0}^m y_i \prod^{m}_{\substack{i = 0\\i \ne j}} \frac{x - x_j}{x_i - x_j} \end{split}\]
  • What is the degree of this polynomial?

  • Why is \(p(x_i) = y_i\)?

  • How expensive (in terms of \(m\)) is it to evaluate \(p(x)\)?

  • How expensive (in terms of \(m\)) is it to convert to standard form \(p(x) = \sum_{i=0}^m a_i x^i\)?

  • Can we easily evaluate the derivative \(p'(x)\)?

  • What can go wrong? Is this formulation numerically stable?

Let’s look at some concrete examples:

  • Suppose we are given the three points \((x_0, y_0), (x_1, y_1), (x_2, y_2)\). Then the polynomial

\[ p_2(x) = y_0 \frac{(x-x_1)(x-x_2)}{(x_0-x_1)(x_0-x_2)} + y_1 \frac{(x-x_0)(x-x_2)}{(x_1-x_0)(x_1-x_2)} + y_2 \frac{(x-x_0)(x-x_1)}{(x_2-x_0)(x_2-x_1)}\]

is the Lagrange’s interpolating for these points. First we notice that the polynomial exactly interpolate the data points. In fact, when \(x_0\) is substituted for \(x\), the terms evaluate to \(y_0 + 0 + 0 = y_0\). Second, notice that the polynomial is of degree 2 in the variable \(x\).

Example:#

Find an interpolating polynomial for the data points \((0, 1)\), \((2, 2)\), and \((3, 4)\). Substituting into Lagrange’s formula yields:

(10)#\[\begin{align} p_2(x) =& 1 \frac{(x-2)(x-3)}{(0-2)(0-3)} +2 \frac{(x-0)(x-3)}{(2-0)(2-3)} +4\frac{(x-0)(x-2)}{(3-0)(3-2)} = \\ =& \frac{1}{6}(x^2 - 5x +6) +2\left(-\frac{1}{2} \right)(x^2 - 3x) +4\left( \frac{1}{3} \right) (x^2 -2x) = \\ =& \frac{x^2}{2} - \frac{x}{2} +1. \end{align}\]

Check that \(p_2(0)=1\), \(p_2(2)=2\), and \(p_2(3)=4\).

x = [0,2,3]
y = [1,2,4]
scatter(x, y, label = L"(x_i, y_i)")

Lagrange interpolation in code#

function lagrange(x, y)
    function p(t)
        m = length(x)
        w = 0
        for (i, yi) in enumerate(y)
            w += yi * (prod(t .- x[1:i-1]) * prod(t .- x[i+1:end])
                / (prod(x[i] .- x[1:i-1]) * prod(x[i] .- x[i+1:end])))
        end
        w
    end
    return p
end


x = LinRange(-1.5, 2, 4)
y = sin.(x)
p = lagrange(x, y)

scatter(x, y, label = L"(x_i, y_i)")
s = LinRange(-3, 3, 50)

plot!(s, [sin.(s) p.(s)], label = ["sin(x)" "p(x)"])
# Notice how important this is
prod(Float64[])
1.0

We don’t have cond(lagrange(x, y))#

Recall that the relative condition number is just a function and we know

\[ \kappa(f) = \lvert f' \rvert \frac{|x|}{|f|} \]
but this definition depends on the input \(x\) and it’s difficult to explore that space.

  • We also don’t have an easy way to evaluate derivatives.

k = 5; xx = LinRange(-1, 1, k)
B = vander(s, k) / vander(xx) # 50×5 Matrix
plot(s, B) # plot the columns of B
scatter!(xx, [zero.(xx) one.(xx)], color=:black, legend=:none, ylims=(-2, 2)) # a closer look xlims = (-1,1)

3. Newton polynomials#

Given the set of data points \((x_0,y_0), (x_1,y_1), \ldots, (x_m,y_m)\), the Newton interpolation polynomial is a linear combination of Newton basis polynomials:

\[ N(x) = \sum_{k=0}^{m} a_k n_k(x) \]

with the Newton basis polynomials defined as

\[ n_k(x) = \prod_{i=0}^{k-1} (x - x_i) \]

where no two \(x_j\) are the same. This inductively constructs the polynomial. The coefficient \(a_k\) only depends on the points up to \(x_k\).

The idea here is to build up the polynomial inductively, adding one point at a time. The added term will not “spoil” the previous work.

This gives the Vandermonde interpolation problem as

\[ \Big[ 1 \Big| (x - x_0) \Big| (x - x_0)(x - x_1) \Big| \dotsb \Big] \Big[ p \Big] = \Big[ y \Big] \]
  • How does the Vandermonde procedure change if we replace the monomial \(x^k\) with \(n_k(x)\)?

  • Does the matrix have recognizable structure?

  • By the construction, adding a new point to the interpolation is easy because the coefficients already computed do not change.

  • The Newton coefficients \(a_k\) are defined inductively via divided differences (check this reference for their tabulated form).

function vander_newton(x, abscissa=nothing)
    if isnothing(abscissa)
        abscissa = x
    end
    n = length(abscissa)
    A = ones(length(x), n)
    for i in 2:n
        A[:,i] = A[:,i-1] .* (x .- abscissa[i-1])
    end
    A
end

A = vander_newton(s, x)
plot(s, A, ylims=(-3, 3), label = ["A_1" "A_2" "A_3" "A_4"])
scatter!(x, [zero.(x)], color=:black, label=nothing)
p = vander_newton(x, x) \ y
scatter(x, y)
plot!(s, [sin.(s), vander_newton(s, x) * p], label = ["sin(x)" "vander_newton p(x)"])

Newton Vandermonde matrix structure#

  • How much does it cost to solve with a general \(n\times n\) dense matrix?

    1. \(O(n \log n)\)

    2. \(O(n^2)\)

    3. \(O(n^3)\)

vander_newton(LinRange(-1, 1, 5))
5×5 Matrix{Float64}:
 1.0  0.0  -0.0   0.0   -0.0
 1.0  0.5   0.0  -0.0    0.0
 1.0  1.0   0.5   0.0   -0.0
 1.0  1.5   1.5   0.75   0.0
 1.0  2.0   3.0   3.0    1.5
  • How much does it cost to solve with a Newton Vandermonde matrix?

You can rearrange the Newton Vandermonde interpolation problem using nested multiplications that reduces the problem to \(O(n^2)\).

How is the conditioning of these matrices?#

vcond(mat, points, nmax) = [cond(mat(points(-1, 1, n))) for n in 2:nmax]

plot([vcond(vander, LinRange, 20)], yscale=:log10, legend=:none)

A well-conditioned basis:#

Let’s construct the Vandermonde matrix, but instead of using the canonical basis for polynomials, we’ll use the Legendre polynomials.

function vander_legendre(x, k=nothing)
    if isnothing(k)
        k = length(x) # Square by default
    end
    m = length(x)
    Q = ones(m, k)
    Q[:, 2] = x
    for n in 1:k-2
        Q[:, n+2] = ((2*n + 1) * x .* Q[:, n+1] - n * Q[:, n]) / (n + 1) # definition of Legendre's polynomials via the generating function
    end
    Q
end
vander_legendre (generic function with 2 methods)
plot([vcond(vander_legendre, LinRange, 20)], yscale=:log10)

4. Choice of points and choice of bases#

Let’s look at different set of points and bases. Chebyshev nodes:

# The following are called Chebyshev nodes
CosRange(a, b, n) = (a + b)/2 .+ (b - a)/2 * cos.(LinRange(-pi, 0, n))

plot([vcond(vander, LinRange, 20)], yscale=:log10, legend=:topleft, label= "vander LinRange")
plot!([vcond(vander, CosRange, 20)], yscale=:log10, label = "vander CosRange")
plot!([vcond(vander_legendre, LinRange, 20)], yscale=:log10, label = "vander_legendre LinRange")
plot!([vcond(vander_legendre, CosRange, 20)], yscale=:log10, label =  "vander_legendre CosRange")

What’s wrong with ill-conditioning?#

runge1(x) = 1 / (1 + 10*x^2)

x_lin = LinRange(-1, 1, 20)
x_cos = CosRange(-1, 1, 20)
y_lin = runge1.(x_lin)
y_cos = runge1.(x_cos)
plot(runge1, xlims=(-1, 1))
scatter!(x_lin, y_lin)
plot(runge1, xlims=(-1, 1))
scatter!(x_cos, y_cos)
ourvander = vander_legendre
p = ourvander(x_lin) \ y_lin
scatter(x_lin, y_lin, label = L"(x_i,y_i)")
s = LinRange(-1, 1, 100)
plot!(s, runge1.(s), label = "runge")
plot!(s, ourvander(s, length(x_lin)) * p, label = "vander_legendre", title = "equi-spaced points")
# let's try again with the Chebyshev nodes
ourvander = vander_legendre
p = ourvander(x_cos) \ y_cos
scatter(x_cos, y_cos, label = L"(x_i,y_i)")
s = LinRange(-1, 1, 100)
plot!(s, runge1.(s), label = "runge")
plot!(s, ourvander(s, length(x_cos)) * p, label = "vander_legendre", title = "Chebyshev nodes")
# let's look at what happened with the linear (equi-spaced points)
ourvander(s, length(x_lin)) # 100×20 matrix
ourvander(x_lin) # 20x20 matrix
ourvander(s, length(x_lin)) / ourvander(x_lin) # 100×20 matrix
cond(ourvander(s, length(x_lin)) / ourvander(x_lin))
4115.285306798416
cond(ourvander(x_lin))
7274.598185488539

Important

The take-home message here is that equally spaced interpolants of high degree can be disastrous (and should be avoided if possible). However, with the right choice of points, high degree interpolants can work.

The worst vector#

A = ourvander(s, length(x_lin)) / ourvander(x_lin)

U, S, V = svd(A) # Vt factor is 20×20
V[:, 1:1] # extract first column
plot(x_lin, V[:, 1:1])
plot(s, U[:,1])
@show S[1]
S[1] = 4141.821782071815
4141.821782071815