- Christos Konstantinou
- Callum Abbott
- Zhang Yuying
- Guanxing Huang
Write an R function, newton
, implementing Newton’s method for minimization of functions, and provide example code using it to optimize Rosenbrock’s function.
Your newton
optimization function should operate broadly in the same way as nlm
. Note that the purpose is to have an independent implementation: you must code the optimization yourself, not simply call optimization code written by someone else.
The function signature should be as follows,
newton(theta, f, ..., tol = 1e-8, fscale = 1, maxit = 100, max.half = 20)
with the arguments defined as follows:
theta
is a vector of initial values for the optimization parameters.f
is the objective function to minimize. Its first argument is the vector of optimization parameters. Remaining arguments will be passed from newton using...
. The scalar value returned byf
will have two attributes, a"gradient"
vector and, optionally, a"hessian"
matrix....
any arguments off
after the first (the parameter vector) are passed using this (see hints below).tol
the convergence tolerance.fscale
a rough estimate of the magnitude off
at the optimum - used in convergence testing.maxit
the maximum number of Newton iterations to try before giving up.max.half
the maximum number of times a step should be halved before concluding that the step has failed to improve the objective.
Your newton
function should return a list containing:
f
the value of the objective function at the minimum.theta
the value of the parameters at the minimum.iter
the number of iterations taken to reach the minimum.g
the gradient vector at the minimum (so the user can judge closeness to numerical zero).Hi
the inverse of the Hessian matrix at the minimum (useful if the objective is a negative log likelihood).
The function should issue errors or warnings (using stop
or warning
as appropriate) in at least the following cases.
- If the objective or derivatives are not finite at the initial
theta
. - If the step fails to reduce the objective despite trying
max.half
step halvings. - If
maxit
is reached without convergence. - If the Hessian is not positive definite at convergence.
To judge whether the gradient vector is close enough to zero, you will need to consider the magnitude of the objective (you can't expect gradients to to be down at 10-10 if the objective is of order 1010, for example). So the gradients are judged to be zero when they are smaller than tol
multiplied by the objective. But then there is a problem it the objective is zero at the minimum - we can never succeed in making the magnitude of the gradient less than the magnitude of the objective. So fscale
is provided. Then if f0
is the current value of the objective and g
the current gradient,
```
max(abs(g)) < (abs(f0)+fscale)*tol
```
is a suitable condition for convergence.
If no Hessian matrix is supplied, your code should generate one by finite differencing the gradient vector. Such an approximate Hessian will be asymmetric: H <- 0.5 * (t(H) + H)
fixes that.