TimDettmers/sparse_learning

Learning rate for Imagenet

Shiweiliuiiiiiii opened this issue · 1 comments

Hi Tim,

First, thank you for your code.

I notice that you change the default learning rate for Imagenet in multi-GPU running by multiplying 0.1 with the number of GPUs. I am wondering did you actually use this to get the reported performance in the paper? Will this results in better performance only for sparse training or also dense performance.

Many thanks

Good catch, I was not aware of this behavior. I did not change the code any further and trained on 4 GPUs. I have not studied the performance difference in detail if I change this behavior. It might have affected the results for sparse and dense performance, but since I have not any data I cannot say for sure what the effect is.