alexis-jacq/Pytorch-DPPO

Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286

PythonMIT

Issues

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [100, 1]], which is output 0 of TBackward, is at version 3; expected version 2 instead.
#9 opened 4 years ago by TJ2333
2
Question on algorithm itself
#8 opened 6 years ago by QiXuanWang
2
average gradients to update global theta?
#7 opened 6 years ago by weicheng113
8
Failed in more complex environment
#6 opened 7 years ago by kkjh0723
1
on advantages
#5 opened 7 years ago by cn3c3p
1
Loss questions
#4 opened 7 years ago by wassname
3
clamp ratio
#3 opened 7 years ago by cswhjiang
1
Old policy?
#2 opened 7 years ago by Kaixhin
5