This repository is a PyTorch implementation of One Solution is Not All You Need
The DIAYN part of the code is based on this repo.
Changes:
- Save and load replay buffer to enable pause / resume training
- Automatic tuning of entropy alpha
- Consider env rewards when training the policy
- gym == 0.21
- mujoco-py == 2.1.2.14
- numpy == 1.23.3
- opencv_contrib_python == 4.6.0
- psutil == 5.9.2
- torch == 1.12.1
- tqdm == 4.64.1
pip3 install -r requirements.txt
train.sh MountainCarContinuous-v0
:
python main_os.py --agent_name SACa --reward_epsilon 10000 --mem_size=100000 --env_name="$1" --n_skills=1 --do_train --auto_entropy_tuning --alpha 0.0
- One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL, Kumar, 2020
- Diversity is All You Need: Learning Skills without a Reward Function, Eysenbach, 2018
Most of the repo is based on @alirezakazemipour implementation of DIAYN
- @ben-eysenbach for sac.
- @p-christ for DIAYN.py.
- @johnlime for RlkitExtension.
- @Dolokhow for rl-algos-tf2 .