Codebase for Collaborative Evolutionary Reinforcement Learning accepted to be published in the Proceedings of the 36th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).
-
Setup Conda
- Install Anaconda3
- conda create -n
$ENV_NAME$ python=3.6.1 - source activate
$ENV_NAME$
-
Install Pytorch version 1.0
- Refer to https://pytorch.org/ for instructions
- conda install pytorch torchvision -c pytorch [GPU-version]
-
Install Numpy, Cython and Scipy
- pip install numpy==1.15.4
- pip install cython==0.29.2
- pip install scipy==1.1.0
-
Install Mujoco and OpenAI_Gym
- Download mjpro150 from https://www.roboti.us/index.html
- Unzip mjpro150 and place it + mjkey.txt (license file) in ~/.mujoco/ (create the .mujoco dir in you home folder)
- pip install -U 'mujoco-py<1.50.2,>=1.50.1'
- pip install 'gym[all]'
main.py: Main Script runs everything
core/runner.py: Rollout worker
core/ucb.py: Upper Confidence Bound implemented for learner selection by the resource-manager
core/portfolio.py: Portfolio of learners which can vary in their hyperparameters
core/learner.py: Learner agent encapsulating the algo and sum-statistics
core/buffer.py: Cyclic Replay buffer
core/env_wrapper.py: Wrapper around the Mujoco env
core/models.py: Actor/Critic model
core/neuroevolution.py: Implements Neuroevolution
core/off_policy_algo.py: Implements the off_policy_gradient learner TD3
core/mod_utils.py: Helper functions
python main.py -env HalfCheetah-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}
python main.py -env Hopper-v2 -portfolio {10,14} -total_steps 1.5 -seed {2018,2022}
python main.py -env Humanoid-v2 -portfolio {10,14} -total_steps 1 -seed {2018,2022}
python main.py -env Walker2d-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}
python main.py -env Swimmer-v2 -portfolio {10,14} -total_steps 2 -seed {2018,2022}
python main.py -env Hopper-v2 -portfolio {100,102} -total_steps 5 -seed {2018,2022}
where {} represents an inclusive discrete range: {10, 14} --> {10, 11, 12, 13, 14}
All roll-outs (evaluation of actors in the evolutionary population and the explorative roll-outs conducted by the learners run in parallel). They are farmed out to different CPU cores, and write asynchronously to the collective replay buffer. Thus, slight variations in results are observed even with the same seed.