Independence of the step-size and stochastic gradient ?

Question

Independence of the step-size and stochastic gradient ?

Algue-Rythme opened this issue 3 years ago · 2 comments

Dear authors,
Thanks for this work

According to the paper, Appendix F.1 in page 25: "To enforce independence of the step-size and stochastic gradient, we perform a backtracking line-search at the current iterate w_k using a mini-batch of examples that is independent of the mini-batch on which∇f_ik(w_k) is evaluated."

I am not sure if I understand well:

do you perform all the computations of Armijo line search with a batch i (and its gradient) to find learning rate Eta, before using Eta to perform gradient step with gradient of batch j ?

Because I read the implementation of Sls and it seems that you are using the same batch (x,y) for both the update and the Line Search, contrary to what specifies the paper. I understand because you are using closure() function to perform Armijo, and the last iterate of Armijo is actually used as final step (still using the same closure() function).

Is there anywhere else in the code where you used the trick of forcing indepandance between Eta and Gradient ?

Thank you very much

Answer 1 · 2021-10-08T10:43:17.000Z

Dear authors
@IssamLaradji
I am still interested in this topic and I would be glad if you could answer my concern.
Thank you

Answer 2 · 2023-04-18T01:29:50.000Z

Hi Algue,

Yes, your understanding of the implementation is right. The SLS is using the same batch (x,y) for both the update and the Line Search which is in line with the description of SLS in the paper. In the main paper, it is mentioned that the SLS is used on the same batch (x,y) that the gradients were computed. The closure is a proxy for loss_function and gives the loss the current parameter.

Regards,