Nan loss for ResNext backbone trained on cifar 100
devavratTomar opened this issue · 1 comments
devavratTomar commented
Thank you for your work. While trying your code for the Resnext backbone on cifar100, I get nan values for the training loss. As mentioned in the published paper, I use the initial learning rate of 0.1 for SGD with cosine scheduling.
SaraGhazanfari commented
Yes, same here.
Could you please help with this?
Thanks,
Sara