/One-Solution-is-Not-All-You-Need

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

Primary LanguagePythonMIT LicenseMIT

PRs Welcome

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

This repository is a PyTorch implementation of One Solution is Not All You Need

The DIAYN part of the code is based on this repo.

Changes:

  • Save and load replay buffer to enable pause / resume training
  • Automatic tuning of entropy alpha
  • Consider env rewards when training the policy

Dependencies

  • gym == 0.21
  • mujoco-py == 2.1.2.14
  • numpy == 1.23.3
  • opencv_contrib_python == 4.6.0
  • psutil == 5.9.2
  • torch == 1.12.1
  • tqdm == 4.64.1

Installation

pip3 install -r requirements.txt

Usage

train.sh MountainCarContinuous-v0:

python main_os.py --agent_name SACa --reward_epsilon 10000 --mem_size=100000 --env_name="$1" --n_skills=1 --do_train --auto_entropy_tuning --alpha 0.0

Reference

  1. One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL, Kumar, 2020
  2. Diversity is All You Need: Learning Skills without a Reward Function, Eysenbach, 2018

Acknowledgment

Most of the repo is based on @alirezakazemipour implementation of DIAYN

  1. @ben-eysenbach for sac.
  2. @p-christ for DIAYN.py.
  3. @johnlime for RlkitExtension.
  4. @Dolokhow for rl-algos-tf2 .