Doubt about gradient transfer to shared model
fedetask opened this issue · 1 comments
fedetask commented
First of all, thanks for publishing this work! It's being really helpful to me to learn many RL concepts in practice!
In train.py the gradient transfer from the local model to the shared model is performed by:
# Transfers gradients from thread-specific model to shared model
def _transfer_grads_to_shared_model(model, shared_model):
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
if shared_param.grad is not None:
return
shared_param._grad = param.grad
I have a few questions as this function looks sub-optimal to me:
- Does the function returns if
shared_param.grad is not None
to avoid multiple workers to copy gradients at the same time? - If that is the case, it looks like the gradients are thrown away, am I correct?
If my understanding is not completely wrong, I was wondering if there is a reason behind not using a Lock
mechanism when transferring the gradients. Could the function be as follows?
# Transfers gradients from thread-specific model to shared model
def _transfer_grads_to_shared_model(model, shared_model, lock):
with lock:
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
shared_param._grad = param.grad
The same also may apply to the optimizer.step()
call.
Kaixhin commented
It's been a long time since I coded this so can't say for sure, but I think you're right on all counts. I never looked into locks, but indeed it may a better way to do this,