hu-po/pySACQ

act() with gradient recording

Opened this issue · 0 comments

Hi there,

Very interesting work! I've really enjoyed reading the paper.

When I looked up the implementation, I realized that the act function defined in model.py was not defined under "with torch.no_grad()", meaning all the actor.forward calls (wrapped by actor.predict()) were being recorded for back-propagation.

Is this intended? If so, why is it?

I should say I am pretty new to pytorch but to me, all the operations done in act() should not be part of backprop but only the operations done to get the losses in learn() should be. However the current implementation accumulate all the operations done in both act() and learn(), which will be back-propagated in learn(), since actor.eval() doesn't freeze history tracking.

Feel free to correct me if I got wrong.

Thanks,
Yongkee