alexis-jacq/Pytorch-DPPO
Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286
PythonMIT
Issues
- 2
one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of TBackward, is at version 3; expected version 2 instead.
#9 opened by TJ2333 - 2
Question on algorithm itself
#8 opened by QiXuanWang - 8
average gradients to update global theta?
#7 opened by weicheng113 - 1
Failed in more complex environment
#6 opened by kkjh0723 - 1
on advantages
#5 opened by cn3c3p - 3
Loss questions
#4 opened by wassname - 1
clamp ratio
#3 opened by cswhjiang - 5
Old policy?
#2 opened by Kaixhin