This problem is the final project of the lecture Optimization for MDS in CUHKSZ.
Thank you, Andre!
Let
where:
-
$\lambda > 0$ is a regularization paramter -
$\lVert \cdot \rVert_{p}$ is the standard Eucilidean norm -
$X^{*} = (x_{1}^{*}, \dots, x_{n}^{*})$ is the optimal solution]
then
we choose
where:
By some derivation, we can have the gradient of the Huber-norm function:
Main difficulties lie in the 2nd term in the objective function.
In order to speed up the computing in each iteration, we can rewrite the gradient and Hessian in the form of matrix calculation.
The full matrix expression of the gradient can be write down below as
Noted that, if we calculate the full gradient, the huber norm should be computed in pairwise, which should be written in two for loop.
To speed up this process, the gradient can be written in the following multiplation of matrixes: $$G = \nabla^{2}\varphi( \scriptsize \left(\begin{array}{cccccccc} 1 & -1 & 0 & 0 & 0 & \cdots & 0 & 0 \\ 1 & 0 & -1 & 0 & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 1 & 0 & 0 & 0 & 0 & \cdots & 0 & -1 \\ 0 & 1 & -1 & 0 & 0 & \cdots & 0 & 0 \\ 0 & 1 & 0 & -1 & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & 0 & 0 & \cdots & 1 & -1 \end{array}\right)\left(\begin{array}{c} x_{1} \\ x_{2} \\ x_{3} \\ \vdots \\ x_{n-1} \\ x_{n} \end{array}\right)) $$
In Newton method, it is necessary for us to calculate
Similarly, we first give the detailed format of hessian matrix of the second item which is:
It is difficult to express it as the product of two matrices so we can express it as the sum of two matrices: $$ H = \nabla^{2}\varphi(diag(JB) + F) $$
As for the
$$
JB= \scriptsize
\begin{pmatrix}
0 & 1 & 1 & \cdots & 1 \\
1 & 0 & 1 & \cdots & 1 \\
1 & 1 & 0 & \cdots & 1 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
1 & 1 & 1 & \cdots & 0
\end{pmatrix}
\begin{pmatrix}
x_{1}-x_{1} & x_{1}-x_{2} & x_{1}-x_{3} & \cdots & x_{1}-x_{n} \\
x_{1}-x_{2} & x_{2}-x_{2} & x_{2}-x_{3} & \cdots & x_{2}-x_{n} \\
x_{1}-x_{3} & x_{2}-x_{3} & x_{3}-x_{3} & \cdots & x_{3}-x_{n} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
x_{1}-x_{n} & x_{2}-x_{n} & x_{3}-x_{n} & \cdots & x_{n}-x_{n}
\end{pmatrix}
$$
Thus,
And for the
The performance in our code shows that the matrices
processing can reduce the CPU time by 50% in each iteration.
Algorithm of AGM
- Initialization: Choose a point
$x^{0}$ and set$x^{-1} = x^{0}$ - Select
$\beta_{k}$ and compute the step$y^{k+1} = x^{k}+\beta_{k}(x^{k} - x^{k-1})$ - Select a step size
$\alpha_{k}>0$ and set$x^{k+1} = y^{k+1} - \alpha \nabla f(y^{k+1})$
Here we choose constant stepsize with parameters:
Algorithm of NCG
- Initialization: Choose a point
$x^{0}$ and choose$\sigma, \gamma \in (0,1), and (\rho_{k})_{k}, k \in \mathbb{N}$ . - Set
$A = \nabla^{2}f(x^{k}), v^{0}=0, r^{0}=\nabla f(x^{k}), p^{0} = -r^{0}$ and$tol = \rho_{k}\lVert \nabla f(x^{k})\rVert$ . - For
$j =0,1,\dotsm$ - if
$(p^{j})^{\top}Ap^{j} \le 0$ return$d^{k} = v^{j}$ - compute
$\sigma_{j} = {{\lVert r^{j} \rVert^{2}} \over {(p^{j})^{\top}Ap^{j}} }$ ,$v^{j+1} = v^{j} + \sigma_{j}p^{j}, r^{j+1} = r^{j} + \sigma_{j}Ap^{j}.$ - If
$\lVert r^{j+1} \rVert \le \text{tol}$ then STOP and return$d^{k} = v^{j+1}.$ - Calculate
$\beta_{j+1} = {{\lVert r^{j+1} \rVert}^{2} \over {\lVert r^{j} \rVert}^{2}}$ and$p^{j+1}=-r^{j+1}+\beta_{j+1}p^{j}$
- if
- Calculate
$\alpha_{k}$ via backtracking and update$x^{k}$ - If gradient less than
$\epsilon$ , then STOP and output.