TianhongDai/hindsight-experience-replay

Why MPI.sum in sync_grad (utils.py)

sritee opened this issue · 1 comments

Why do you sum rather than average the gradients in sync_grads? Won't this result in different learning rates when you run different number of processes?

@sritee Yes, It will only result in different learning rates. Because I have tried it with both sum and average. I found sum can achieve better results. From my own opinion (maybe not correct) - when we sum gradients from each MPI workers, we can get "strong" update direction (you can also think it's a process of denoising). In this case, we can use "large" learning rate to accelerate the training.