pytorch-rl

A list of references to my reimplementations of RL algorithms:

  • Asynchronous Methods for Deep Reinforcement Learning (A3C) (arxiv, my code)

  • Advantage Actor Critic (A2C) (my code)

  • Proximal Policy Optimization Algorithms (PPO) (arxiv, my code)

  • Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR)(arxiv, my code)

  • Trust Region Policy Optimization (TRPO) (arxiv, my code)

  • Continuous Deep Q-Learning with Model-based Acceleration (NAF) (arxiv, my code)

TODO (volunteers are welcome)

  • Move TRPO to a2c-ppo-acktr code, implement it as a hessian free optimizer (as ACKTR is implemented as KFAC)