Khrylx/PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

PythonMIT

Issues

Why does GAIL get lower rewards the more it is trained?
#36 opened a year ago by ZXAXKL
1
Fail to train of GAIL in Ant-v2 environment
#28 opened 3 years ago by seolhokim
3
TRPO,Is fixed_log_probs the same as log_probs
#35 opened 2 years ago by yongpan0715
1
Is the implelented performance comparable with the results provided in original GAIL paper?
#34 opened 2 years ago by huang-fuxian
0
What's Conjugate gradients and line_search in TRPO?
#31 opened 3 years ago by Dreamlikec
1
Various questions?
#26 opened 4 years ago by lviano
1
A question bout PPO implementation
#33 opened 3 years ago by pengzhi1998
0
About computing Hessian*vector
#32 opened 3 years ago by jjjhfffjj
0
What's Conjugate gradients and line_search in TROP?
#30 opened 3 years ago by Dreamlikec
0
Is this repository only work for Gym Environments?
#29 opened 3 years ago by XueminLiu111
0
Implementation problem
#27 opened 3 years ago by pengzhi1998
6
How are we using rewards in imitation learning?
#25 opened 4 years ago by SiddharthSingi
4
Mountain Car
#24 opened 4 years ago by jpark0315
0
Question on multiprocessing
#22 opened 4 years ago by pengzhi1998
1
Doubt regarding the calculation of advantage
#23 opened 4 years ago by nesarasr
2
about the kl
#21 opened 4 years ago by yangyiqin-tsinghua
3
is this an error：num_steps += (t + 1) ？
#20 opened 4 years ago by pprivulet
1
GAIL discriminator loss uses complete expert data in each iteration?
#18 opened 4 years ago by SapanaChaudhary
4
question about A2C
#17 opened 5 years ago by kishanpb
0
Confusion about advantage computation
#16 opened 5 years ago by gunshi
0
Example for Continued PPO training after GAIL?
#15 opened 5 years ago by signalprime
0
Inconsistent action shape when running CartPole-v1
#13 opened 5 years ago by truongthanh96
1
question about weight init
#14 opened 5 years ago by gunshi
2
Not able to run the TRPO example on GPU
#12 opened 5 years ago by avijit9
1
TRPO: KL Divergence Computation
#11 opened 5 years ago by sandeepnRES
1
Training a recurrent policy
#4 opened 6 years ago by erschmidt
4
Few Runtime errors
#10 opened 5 years ago by sandeepnRES
1
Entropy Term for GAIL
#9 opened 5 years ago by sandeepnRES
2
CNN Policy
#8 opened 6 years ago by bbalaji-ucsd
1
result is not good
#7 opened 6 years ago
2
About the computation of Advantage and State Value in PPO
#6 opened 6 years ago by mjbmjb
2
Concatenation of memories with not terminated episode
#5 opened 6 years ago by lcswillems
1
Autograd Import Error
#3 opened 6 years ago by aseembits93
3
Memory leak during GPU training
#2 opened 6 years ago by erschmidt
2
CudnnRNN is not differentiable twice
#1 opened 6 years ago by erschmidt
4