shivram1987/diffGrad

Previous grad is same as current grad due to call by reference.

thebhatman opened this issue · 7 comments

The previous grad is same as current grad for all optimization steps except the first step (where prev_grad is zero), due to call by reference happening. Hence the diff is zero for all steps except the first step. Printing diff and plotting it with optimization steps shows this. It can be fixed by state['previous_grad'] = grad.clone(). That way the previous grad will be stored properly, without changing the values at the same memory location.

Thank you very much for pointing this out.
I have updated the code after fixing this bug. I have observed even better performance than reported in the paper with updated code.

@shivram1987 The code still is incorrect. In line 80, it should be
exp_avg, exp_avg_sq, previous_grad = state['exp_avg'], state['exp_avg_sq'], state['previous_grad'].clone()

The detailed critical review I presented in the forum discussion for Diffgrad on Fast.ai can be found here: https://forums.fast.ai/t/meet-diffgrad-new-optimizer-that-solves-adams-overshoot-issue/60711/5?u=diganta

@digantamisra98 The current code is correct. Cloning is required at only one place and it is done in line 100.
The performance increases after fixing it as follows:
Using ResNet50 on CIFAR10 dataset with batch size 128: earlier - 94.08%, now - 94.27%
Using ResNet50 on CIFAR10 dataset with batch size 64: earlier - 94.05%, now - 94.24%
Using ResNet50 on CIFAR10 dataset with batch size 32: earlier - 93.9%, now - 94.24%

Correct, my bad, I was appending previous grad after it was updated with grad. Another question, the DFC co-efficient is not a scalar but a tensor since you're calculating the absolute difference of two tensors. Was that intended?

DFC coefficient is computed for each parameter.

The arxiv version of paper is now updated with new results of diffGrad after fixing this bug.

Hi, Whenever I changed the optimizer from Adam to Diffgrade, I found this error:
AttributeError: 'Sequential' object has no attribute 'parameters"
I was wondering if you could guide me how can I fix this issue?