ikostrikov/pytorch-a2c-ppo-acktr-gail

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

PythonMIT

Issues

Error after python main.py --env-name "PongNoFrameskip-v4"
#271 opened 4 years ago by FulChou
3
get different results when I set the same seed
#237 opened 2 years ago by yunke-wang
3
Updates: Support the latest Atari environment and state entropy maximization-based exploration.
#296 opened 2 years ago by yuanmingqi
0
Why didn't run to generate log?
#294 opened 3 years ago by Can-no
0
Why is episode_rewards negative when running ant_v3 with PPO?
#293 opened 3 years ago by Can-no
0
Where are the experts data for GAIL get from?
#292 opened 3 years ago by YY-GX
0
setup.py and requirements.py have same dependencies except for h5py
#291 opened 3 years ago by andyk
0
Oops! wrong repo :-D
#290 opened 3 years ago by andyk
1
question about the recurrent
#289 opened 3 years ago by rainbow979
1
[Question]Can I use Recurrent_policy for GAIL at this implementation?
#288 opened 3 years ago by LongchaoDa
0
why PPO needs to store action_log_probs instead of using stop_gradient for better efficiency?
#284 opened 3 years ago by Emerald01
1
object has no attribute 'steps' in acktr
#283 opened 3 years ago by sungreong
0
Combine Acktr model with grad-cam
#268 opened 4 years ago by seed851218
2
No softmax before categorical loss?
#282 opened 3 years ago by nirweingarten
0
Operations that have no effect
#281 opened 3 years ago by ArashVahabpour
0
CNN Architecture
#280 opened 3 years ago by araffin
0
Possible bug on the sign of policy log prob. in Fisher computation
#279 opened 3 years ago by daniloefl
0
Stale hidden states
#278 opened 3 years ago by aklein1995
0
Can not run enjoy.py
#277 opened 4 years ago by juanjuan2
0
Can I train in my own game
#276 opened 4 years ago by hhhcwb38712
0
Why acktr algorithm cannot be used in Mujoco settings?
#275 opened 4 years ago by ChenDRAG
0
observation reset before insert
#274 opened 4 years ago by seed851218
0
does mask introduce bias in the gail implementation ?
#272 opened 4 years ago by HareshKarnan
0
Suggestion - implement some "tricks" that improve performance
#266 opened 4 years ago by henrycharlesworth
1
Can't access to the trianed model files.
#261 opened 4 years ago by TigerVersusT
3
New parallel PyTorchRL library based on this one
#267 opened 4 years ago by giadefa
0
I converted your implementation to tensorflow but it does not work
#264 opened 4 years ago
0
enjoy.py failes. Unexpected argument 'ret'
#263 opened 4 years ago by jakefoster954
2
assert 'NoFrameskip' in env.spec.id
#253 opened 4 years ago by liuqi8827
2
Does setting the flag "use-proper-time-limits" to be True recommended for all gym environments with time limit?
#259 opened 4 years ago by PeixinC
1
Unable to run enjoy.py
#262 opened 4 years ago by jakefoster954
1
Wrong continues actions
#245 opened 4 years ago by oroojlooy
1
PPO Not Converge for Pendulum-v0
#260 opened 4 years ago by ZhizhenQin
0
Generates a sequence of objc errors on macOs Big Sur
#257 opened 4 years ago
0
Running main.py in PyCharm, results in BrokenPipeError or EOFError
#256 opened 4 years ago
0
adaptive adam learning rate
#252 opened 4 years ago by a-z-e-r-i-l-a
0
should h5py be listed as dependency?
#251 opened 4 years ago by suliuzh
0
GAIL uses AIRL reward function
#236 opened 5 years ago by HareshKarnan
2
What can compute_grad_pen in gail.py do?
#250 opened 4 years ago by ruleGreen
0
EOFError when entering a subprocess worker
#249 opened 4 years ago by Artimisu
0
How to run this examples without tensorflow?
#247 opened 4 years ago by jonndoe
0
Make pretrained models available for Atari Games
#246 opened 4 years ago by asaran
0
Mujoco Reacher-v2 fails to train
#244 opened 4 years ago by oroojlooy
0
Usage of gradient penalty without Wasserstein Loss
#240 opened 4 years ago by mayankg95
1
the usage of after_update in rollout storage
#243 opened 4 years ago by jiangsy
1
leveraging parallel environments for sampling faster
#242 opened 4 years ago by a-z-e-r-i-l-a
0
how can I train my own RL agent to create demonstration?
#241 opened 4 years ago by haoyu-x
0
Incorrect number of environments created when not using VecNormalize
#239 opened 4 years ago by hai-h-nguyen
0
FPS calculation
#238 opened 4 years ago by Xemnas0
3
Insert obs, action in storge (PPO)
#235 opened 5 years ago by mynsng
0