What is the use of "AllReduce"?
yyk-wew opened this issue · 0 comments
yyk-wew commented
Hello. Thank you for your great work!
I have some questions about the "AllReduce" class defined here.
Lines 226 to 241 in 4388dc1
And it is used in gathering probs when computing me-max regularization.
Lines 70 to 72 in 4388dc1
I wonder why not use "dist.all_reduce(x)" directly. It seems that using "AllReduce" multiply the gradient by "world_size" times.
I want to know whether i am correct and why this makes sense.
Thx!