/taac

Temporally abstract actor-critic for continuous control

Apache License 2.0Apache-2.0

TAAC: Temporally abstract actor-critic for continuous control

TAAC diagram

This repo releases the code for

TAAC: Temporally Abstract Actor-Critic for Continuous Control, Yu et al., NeurIPS 2021.

It also contains the experiment configuration files for training TAAC on 5 categories of 14 continuous control tasks as done in the paper.

What is TAAC?

In a nutshell, TAAC is an off-policy (sample efficient!) actor-critic algorithm that has closed-loop action repetition (temporal abstraction!) built in.

  • TAAC is in the middle ground between "flat" RL (e.g., SAC) and hierarchical RL (e.g., options, goals, etc).
  • TAAC is conceputally simple. Its implementation closely resembles SAC.
  • TAAC natively supports unbiased multi-step TD backup, with a novel compare-through operator!

Highlights of TAAC

TAAC largely outperformed several strong baselines on 14 complex continuous control tasks:

TAAC performance

TAAC learns to skip learning to generate new actions at non-critical states, and save the actor network’s representational power for critical states!

TAAC pattern

More highlights can be found on this poster.

A detailed walkthrough of TAAC is in this video.

Installation

Our experiments use the training pipelines and algorithms of Agent Learning Framework (ALF). Python3.7+ is currently supported by ALF and Virtualenv is recommended for the installation. After activating a virtual env, download and install ALF:

git clone https://github.com/HorizonRobotics/alf
cd alf
git checkout fb30ce1 -B taac
pip install -e .

On top of the basic ALF installation,

  • One task category Terrain requires installing box2d-py.
  • Two task categories (Manipulation and Locomotion) require installing Mujoco. Our experiments use Mujoco 2.0 and a different version might result in a different training result. So we suggest using this exact version for the reproduction purpose. Please follow the instructions at https://github.com/openai/mujoco-py.
  • One task category Driving requires installing CARLA and we used version 0.9.9 in the experiments. Installation instructions can be found in <ALF_ROOT>/alf/environments/suite_carla.py.

After the installation, clone this repo under ALF:

cd <ALF_ROOT>/alf/examples
git clone https://github.com/hnyu/taac

Run experiments

To run an experiment (e.g., training TAAC on BipedalWalker-v2):

cd <ALF_ROOT>/alf/examples
python -m alf.bin.train --root_dir=<TRAIN_JOB_DIR> --gin_file taac/experiments/taac/taac_terrain.gin --gin_param="create_environment.env_name='BipedalWalker-v2'"

Then open the Tensorboard to view the training results

tensorboard --logdir=<TRAIN_JOB_DIR>

Tasks

The 14 tasks can be trained by providing the corresponding environment names to the 5 gin files

gin file create_environment.env_name
<methdod>_simple_control.gin "MountainCarContinuous-v0"
"LunarLanderContinuous-v2"
"InvertedDoublePendulum-v2"
<method>_locomotion.gin "Hopper-v2"
"Ant-v2"
"Walker2d-v2"
"HalfCheetah-v2"
<method>_terrain.gin "BipedalWalker-v2"
"BipedalWalkerHardcore-v2"
<method>_manipulation.gin "FetchReach-v1"
"FetchPush-v1"
"FetchSlide-v1"
"FetchPickAndPlace-v1"
<method>_driving.gin "Town01"

Code reading

The entire TAAC algorithm is implemented in the file alf/algorithms/taac_algorithm.py of the ALF repo downloaded.

Troubleshooting

  • Sometimes running a job complains not finding rsync (ALF uses rsync to backup training code), you just need to first install it and try again. Or simply append the flag --nostore_snapshot when launching the job.
  • CARLA "Fail to start server": just give it another try.
  • If any error related to not finding Python.h during pip installing ALF, please first install the python development package, e.g., sudo apt install python3.7-dev.

Citation

If you use TAAC in the research, please consider citing

@inproceedings{Yu2021TAAC,
    author={Haonan Yu and Wei Xu and Haichao Zhang},
    title={TAAC: Temporally Abstract Actor-Critic for Continuous Control},
    booktitle={NeurIPS},
    year={2021}
}