Use rank-1 update blas kernel for quasi-Newton updates

Question

Use rank-1 update blas kernel for quasi-Newton updates

Opened this issue 5 years ago · 7 comments

mohamed82008 commented 5 years ago

We can get some nice speedup if we use LinearAlgebra.BLAS.ger! to do the rank-1 updates of the approximate Hessian or inverse Hessian in quasi-Newton methods instead of H = H + (w * w') / θ which is much slower for large matrices.

Answer 1 · 2019-12-23T11:03:17.000Z

We can use fallback implementation though for non-BLAS types.

Answer 2 · 2019-12-23T11:11:03.000Z

A bit more advanced would be performing rank-1 updates over the decomposition of H directly, for Direct approximations, to avoid re-factorizing it every time we call \. We have lowrankupdate for Cholesky.

Answer 3 · 2019-12-23T21:14:09.000Z

Yes, thanks for opening this issue. You are touching things that I have planned for, but pushed forward in favor of some other functionality, but yes, those things would be cool.

Standard implementation that's closer to the linear algebra/math to work as fallback
Fast implementations for Array's of supported types (say Float32 and Float64, whatever is available) with dispatch
A factorized version of H that is updated. I originally thought I was going to prioritize this, but then I read some not so great benchmarks and pushed it down the list. But if we already have lowrankupdate (I didn't know that) for Cholesky, that seems like a no-brainer.

We also need L-BFGS and maybe even variants of it. We also need L-SR1 which seems nice is TR contexts (there are even closed forms for the spectral decomposition in the memoryless versions).

Answer 4 · 2019-12-24T09:09:21.000Z

Yes I can perhaps tackle those issues at some point once the package is more stable. One thing is that for small matrices, the BLAS kernel is actually slower than naive implementation, so we need to keep that in mind.

Answer 5 · 2019-12-26T09:42:42.000Z

I think such a check for size should be negligable?

Answer 6 · 2019-12-26T11:54:19.000Z

Yes for all reasonably sized matrices it should be.

Answer 7 · 2019-12-27T22:48:06.000Z

Your objective and gradient calculation would have to be overwhelmingly simple for such a branch to have any effect I'd say. Either way, we can just have it as an option. Auto will check, but specific choices could in principle be inferred if necessary.