google-research/augmix

Nan loss for ResNext backbone trained on cifar 100

devavratTomar opened this issue · 1 comments

Thank you for your work. While trying your code for the Resnext backbone on cifar100, I get nan values for the training loss. As mentioned in the published paper, I use the initial learning rate of 0.1 for SGD with cosine scheduling.

Yes, same here.
Could you please help with this?

Thanks,
Sara