Liuhong99/Sophia

Hessian-vector product vs. Hessian estimator

Closed this issue · 1 comments

Hi, @Liuhong99

(Sorry for this firing this issue, it's more like a question on the detail impl.)
Sophia is now using the hessian estimator(Hutchinson or Gauss-Newton-Bartlett) to do the pre-condition, the paper also mentioned Sophia can use the HVP in PyTorch to do the same thing. Have you also implemented the latter approach? I'm wondering how is the performance gap between these two approaches.

Thank you for sharing the code, very helpful for me to understand the paper.

Cheers, -yuan

Hi @zhouyuan ,
Thanks for your interest! Both Sophia-G and Sophia-H use hessian estimators. Hutchison's estimator relies on HVP, while GNB does not. In this sense, Sophia-G is more promising because it's easier to implement. From the current experiments, GNB is better than Hutchinson.