lgcming/PyTorch-RL

PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO.

Python

PyTorch implementation of reinforcement learning algorithms

I always try my best to make the code clean and more readable. This repository contains:

policy gradient methods (TRPO, PPO, A2C)
Generative Adversarial Imitation Learning (GAIL)

Important notes

To run mujoco environments, first install mujoco-py and my modified version of gym which supports mujoco 1.50.
If you have a GPU, I recommend setting the OMP_NUM_THREADS to 1 (PyTorch will create additional threads when performing computations which can damage the performance of multiprocessing. This problem is most serious with Linux, where multiprocessing can be even slower than a single thread):

export OMP_NUM_THREADS=1

Features

Support CUDA. (x10 faster than CPU implementation)
Support discrete and continous action space.
Support multiprocessing for agent to collect samples in multiple environments simultaneously. (x8 faster than single thread)
Fast Fisher vector product calculation.

Policy gradient methods

Example

python examples/ppo_gym.py --env-name Hopper-v1

Reference

Generative Adversarial Imitation Learning (GAIL)

To save trajectory

python gail/save_expert_traj.py --model-path assets/expert_traj/Hopper-v1_ppo.p

To do imitation learning

python gail/gail_gym.py --env-name Hopper-v1 --expert-traj-path assets/expert_traj/Hopper-v1_expert_traj.p