A small, modular library that contains some implementations of continuous reinforcement learning algorithms. Fully compatible with OpenAI gym.
Original repository: https://github.com/osudrl/apex
Ubuntu 20.04、MuJoCo 200、Python 3.10
Create conda environment
conda create -n cassie python=3.10
conda activate cassie
Install required Python packages
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install scipy matplotlib pandas lxml tensorboard ray
Download mujoco200 and Activation key to path ~/.mujoco
Then add export MUJOCO_KEY_PATH=/home/xxx/.mujoco/mjkey.txt
to the end of ~/.bashrc
Any algorithm can be run from the train.py entry point.
To run PPO on a cassie environment,
python train.py ppo --env_name Cassie-v0 --num_procs 12 --run_name experiment01
To run TD3 on the gym environment Walker-v2,
python train.py td3_async --env_name Walker-v2 --num_procs 12 --run_name experiment02
To continue training existing models
python train.py ppo --n_itr 1000 --num_procs 4 --run_name experiment03 --previous ${parent/path/of/actor.pt}
To test the trained models
python train.py eval --path ${parent/path/of/actor.pt}
Tensorboard logging is enabled by default for all algorithms. The logger expects that you supply an argument named logdir
, containing the root directory you want to store your logfiles in, and an argument named seed
, which is used to seed the pseudorandom number generators.
A basic command line script illustrating this is:
python train.py ars --logdir logs/ars --seed 1337
The resulting directory tree would look something like this:
trained_models/ # directory with all of the saved models and tensorboard logs
└── ars # algorithm name
└── Cassie-v0 # environment name
└── 8b8b12-seed1 # unique run name created with hash of hyperparameters
├── actor.pt # actor network for algo
├── critic.pt # critic network for algo
├── events.out.tfevents # tensorboard binary file
├── experiment.info # readable hyperparameters for this run
└── experiment.pkl # loadable pickle of hyperparameters
Using tensorboard makes it easy to compare experiments and resume training later on.
To see live training progress
Run $ tensorboard --logdir logs/
then navigate to http://localhost:6006/
in your browser
Cassie-v0
: basic unified environment for walking/running policiesCassieTraj-v0
: unified environment with reference trajectoriesCassiePlayground-v0
: environment for executing autonomous missionsCassieStanding-v0
: environment for training standing policies
- Parallelism with Ray
- GAE/TD(lambda) estimators
- PPO, VPG with ratio objective and with log likelihood objective
- TD3 with Parameter Noise Exploration
- DDPG
- RDPG
- ARS
- Entropy based exploration bonus
- advantage centering (observation normalization WIP)
- SAC
- GPO
- NAF
- SVG
- I2A
- PGPE
- Value Distribution
- Oracle methods (e.g. GPS)
- CUDA support (should be trivial but I don't have a GPU to test on currently)
Thanks to @ikostrikov's whose great implementations were used for debugging. Also thanks to @rll for rllab, which inspired a lot of the high level interface and logging for this library, and to @OpenAI for the original PPO tensorflow implementation. Thanks to @sfujim for the clean implementations of TD3 and DDPG in PyTorch. Thanks @modestyachts for the easy to understand ARS implementation.