Can't train to convergence
wuyaozong99 opened this issue · 2 comments
wuyaozong99 commented
Hi, thanks for your implementation!
When i run the mnist_fwdgrad.py and choose the Conv model, the train loss doesn‘t decrease as in the paper, but decreases to 1.7 and then suddenly rises to more than 14. The training is divergent, unable to train to convergence. Have you ever encountered this problem?
belerico commented
Hi @wuyaozong99, sorry for the late response. Have you run the training with the default hyperparams? Have you tried also with the MLP network?
wuyaozong99 commented
@belerico Thanks for your reply. I found that I used a fixed learning rate 2e-4 instead of the default setting, which decreased with iteration. After using the default settings, the divergence phenomenon is alleviated.