12) Newton methods#
Last time#
Convergence classes
Intro to Newton methods
Today#
Newton-Raphson Method
1.1 An implementationConvergence of fixed-point (by mean value theorem)
Convergence of fixed-point (by Taylor series)
A fresh derivation of Newton’s method via convergence theory
using Plots
default(linewidth=4, legendfontsize=12)
f(x) = cos(x) - x
hasroot(f, a, b) = f(a) * f(b) < 0
function bisect_iter(f, a, b, tol)
hist = Float64[]
while abs(b - a) > tol
mid = (a + b) / 2
push!(hist, mid)
if hasroot(f, a, mid)
b = mid
else
a = mid
end
end
hist
end
Precompiling packages...
Plots Being precompiled by another process (pid: 21767, pidfile: /home/vbarra/.julia/compiled/v1.10/Plots/ld3vC_H1Fjh.ji.pidfile)
23314.2 ms ✓ Plots
UnitfulExt Being precompiled by another process (pid: 21767, pidfile: /home/vbarra/.julia/compiled/v1.10/UnitfulExt/4ishe_H1Fjh.ji.pidfile)
Convergence classes#
Tip
Check this Reading
A convergent rootfinding algorithm produces a sequence of approximations
We saw that:
Note
Why is
-convergence considered a weaker form of convergence relative to -convergence? Notice that root convergence is concerned only with the overall rate of decrease of the error while quotient convergence requires the error to decrease at each iteration of the algorithm. Thus -convergence is a stronger form of convergence than -convergence -convergence implies ( ) -convergence.
hist = bisect_iter(f, -1, 3, 1e-10)
r = hist[end] # What are we trusting?
hist = hist[1:end-1]
scatter( abs.(hist .- r), yscale=:log10)
ks = 1:length(hist)
ρ = 0.5
plot!(ks, 4 * (ρ .^ ks))
1. Newton-Raphson Method#
Much of numerical analysis reduces to Taylor series, the approximation
In numerical computation, it is exceedingly rare to look beyond the first-order approximation
1.1 An implementation#
function newton(f, fp, x0; tol=1e-8, verbose=false)
x = x0
for k in 1:100 # max number of iterations
fx = f(x)
fpx = fp(x)
if verbose
println("iteration $k: x=$x, f(x)=$fx, f'(x)=$fpx")
end
if abs(fx) < tol
return x, fx, k
end
x = x - fx / fpx
end
end
f(x) = cos(x) - x
fp(x) = -sin(x) - 1
newton(f, fp, 1; tol=1e-15, verbose=true)
That’s really fast!#
10 digits of accuracy in 4 iterations.
How is this convergence test different from the one we used for bisection?
How can this break down?
newton(f, fp, -pi/2+.1; verbose=true)
2. Convergence of fixed-point (by mean value theorem)#
Consider the iteration
Taking absolute values,
3. Convergence of fixed-point (by Taylor series)#
Consider the iteration
In terms of the error
Poll 12.1: Convergence class of the fixed-point method#
Is this convergence:
A=q-linear
B=r-linear
C=neither
Recall the definition of q-linear convergence
So, clearly, we need
Aside:#
Limit :#
We’d say an algorithm costs
Sometimes we write
Note that
We say the algorithm is in
Limit :#
In numerical analysis, we often have a small real number, and now the definitions take the limit as the small number goes to zero. So we say a term in an expression is in
Big
Example of a fixed-point iteration#
We wanted to solve
xstar, _ = newton(f, fp, 1.)
g(x) = cos(x)
gprime(x) = -sin(x)
@show xstar
@show gprime(xstar)
plot([x->x, g], xlims=(-2, 3), label = ["y=x" "y=g(x)"])
scatter!([xstar], [xstar],
label="\$x_*\$")
function fixed_point(g, x, n)
xs = [x]
for k in 1:n
x = g(x)
append!(xs, x)
end
xs
end
xs = fixed_point(g, 2., 15)
plot!(xs, g.(xs), seriestype=:path, marker=:auto)
Verifying fixed-point method convergence theory#
@show gprime(xstar)
es = xs .- xstar
es[2:end] ./ es[1:end-1]
scatter(abs.(es), yscale=:log10, label="fixed point")
plot!(k -> abs(gprime(xstar))^k, label="\$|g'|^k\$")
Reading: FNC Fixed-point iteration#
Plotting Newton convergence#
function newton_hist(f, fp, x0; tol=1e-12)
x = x0
hist = []
for k in 1:100 # max number of iterations
fx = f(x)
fpx = fp(x)
push!(hist, [x fx fpx])
if abs(fx) < tol
return vcat(hist...) # concatenate arrays
end
x = x - fx / fpx
end
end
xs = newton_hist(f, fp, 1.97)
@show x_star = xs[end,1]
@show xs[1:end-1,1] .- x_star
plot(xs[1:end-1,1] .- x_star, yscale=:log10, marker=:auto, xlabel = "# of iterations", ylabel = "abs error")
Poll 12.2: Convergence of Newton’s method#
Is this convergence:
A=q-linear
B=r-linear
C=neither?
Formulations are not unique (constants):#
If
c = .5
g2(x) = x + c * (cos(x) - x)
g2p(x) = 1 + c * (-sin(x) - 1)
@show g2p(xstar)
plot([x->x, g, g2], ylims=(-5, 5), label=["y=x" "y=g(x)" "y=g2(x)"])
xs = fixed_point(g2, 1., 15)
xs .- xstar
Formulations are not unique (functions)#
If
h(x) = -1 / (gprime(x) - 1)
g3(x) = x + h(x) * (g(x) - x)
plot([x-> x, cos, g2, g3], ylims=(-5, 5), label=["y=x" "y=g(x)" "y=g2(x)" "y=g3(x)"])
Exercise 12.1:#
Compute
Note:
We don’t know
in advance because we don’t know yet.This method converges very fast
We actually just derived Newton’s method.
4. A fresh derivation of Newton’s method via convergence theory#
A rootfinding problem
can be converted to a fixed point problem will have magnitude less than 1.Problem-specific algebraic manipulation can be used to make
small. is also a valid formulation for any bounded away from .Can we choose
such that ?
In other words,
Quadratic convergence!#
What does it mean that
?It turns out that Newton’s method has locally quadratic convergence to simple roots,
“The number of correct digits doubles each iteration.”
Now that we know how to make a good guess accurate, the effort lies in getting a good starting guess.
Rootfinding summary and outlook#
Newton methods are immensely successful
Convergence theory is local; we need good initial guesses
Computing the derivative
is intrusive (and can be error-prone)Avoided by secant methods (approximate the derivative)
Algorithmic or numerical differentiation can also be used instead of computing the derivative by hand
Bisection is robust when conditions are met
When does Newton diverge?
More topics
Do we find all the roots?
Times when Newton converges slowly
Exercise 12.2:#
Solve the fixed point iteration
, up to a tolerance or maximum number of iterations , with (a) , (b) . Which execution is better? And why?Solve the fixed point iteration
, up to a tolerance or maximum number of iterations , with (a) , (b) . Which execution is better? And why?