Can't train to convergence

Question

Can't train to convergence

wuyaozong99 opened this issue 2 years ago · 2 comments

Hi, thanks for your implementation!

When i run the mnist_fwdgrad.py and choose the Conv model, the train loss doesn‘t decrease as in the paper, but decreases to 1.7 and then suddenly rises to more than 14. The training is divergent, unable to train to convergence. Have you ever encountered this problem?

Answer 1 · 2023-01-22T00:07:47.000Z

Hi @wuyaozong99, sorry for the late response. Have you run the training with the default hyperparams? Have you tried also with the MLP network?

Answer 2 · 2023-02-08T14:19:49.000Z

@belerico Thanks for your reply. I found that I used a fixed learning rate 2e-4 instead of the default setting, which decreased with iteration. After using the default settings, the divergence phenomenon is alleviated.