/policydissect

[NeurIPS 2022] Official implementation of the paper: "Human-AI Shared Control via Policy Dissection"

Primary LanguagePythonApache License 2.0Apache-2.0

Policy Dissection

[NeurIPS 2022] Official implementation of the paper: Human-AI Shared Control via Policy Dissection

Webpage | Code | Video | Paper |

Currently, we provide some interactive neural controllers enabled by Policy Dissection. The policy dissection method and training code will be updated soon.

Supported Environments:

Installation

Basic Installation

# Clone the code to local
git clone https://github.com/metadriverse/policydissect.git
cd policydissect

# Create virtual environment
conda create -n policydissect python=3.7
conda activate policydissect

# install torch
pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

# Install basic dependency
pip install -e .

IsaacGym Installation (Optional)

For playing with agents trained in IsaacGym, follow the instructions below to install IsaacGym

Please review the file isaacgym/docs/install.html for more information on installation. See the Troubleshooting section for debugging.

Mujoco Installation (Optional)

For playing with the Mujoco-Ant and Mujoco-Walker, please

Play with AI

MetaDrive

To collaborate with the AI driver in MetaDrive environment, run:

# MetaDrive
# Keymap:
# - KEY_W: lane following
# - KEY_A: left lane changing
# - KEY_S: braking
# - KEY_D: right lane changing
# - KEY_R:Reset
python policydissect/scripts/play_metadrive.py

Pybullet Quadrupedal Robot

The quadrupedal robot is trained with the code provided by https://github.com/Mehooz/vision4leg.git. For playing with legged robot, run:

# Pybullet Quadrupedal Robot
# Keymap:
# - KEY_W: forward
# - KEY_A: moving left
# - KEY_S: stop
# - KEY_D: moving right
# - KEY_R: reset
python policydissect/scripts/play_quadrupedal.py
python policydissect/scripts/play_quadrupedal.py --hard
python policydissect/scripts/play_quadrupedal.py --hard --seed 1001

Also, you can collaborate with AI and challenge the hard environment consisting of obstacles and challenging terrains by adding --hard flag. You can change to a different environment by adding --seed your_seed_int_type.

tips: Avoid running fast!

IsaacGym Cassie

The Cassie robot is trained with the code provided by https://github.com/leggedrobotics/legged_gym with a fixed forward command [1, 0, 0], and thus can only move forward. By applying Policy Dissection, primitives related to yaw rate, forward speed, height control and torque force can be identified. Activating these primitives enable various skills like crouching, forward jumping, back-flipping and so on. Run the following command to play with the robot. Add flag--parkourto launch a challenging parkour environment.

# Keymap:
# - KEY_W:Forward
# - KEY_A:Left
# - KEY_S:Stop
# - KEY_C:Crouch
# - KEY_X:Tiptoe
# - KEY_Q:Jump
# - KEY_D:Right
# - KEY_SPACE:Back Flip
# - KEY_R:Reset
python policydissect/scripts/play_cassie.py
python policydissect/scripts/play_cassie.py --parkour

tips: Switch to Tiptoe state before pressing Key_Q to increase the distance of jump.

Note Do not draw the windows or close the pygame window during running.

Gym Environments

We also discover motor primitives in three gym environments: Box2d-BipedalWalker, Mujoco-Ant and Mujoco-Walker. You can try them via:

# BipedalWalker
# Keymap:
# - KEY_W: jump
# - KEY_A: stand up from split
# - KEY_S: restore running after jumping
# - KEY_R: reset
python policydissect/scripts/play_gym_bipedalwalker.py

# Mujoco-Ant
# Keymap:
# - KEY_W: move up
# - KEY_A: move left
# - KEY_S: move down
# - KEY_D: move right
# - KEY_Q: rotation
# - KEY_R: reset
python policydissect/scripts/play_mujoco_ant.py
    
# Mujoco-Walker
# Keymap:
# - KEY_R: reset
# - KEY_A: stop
# - KEY_W: freeze red knee
# - KEY_D: restore running
python policydissect/scripts/play_mujoco_walker.py

Comparison with explicit goal-conditioned control

To measure the coarseness of the control approach enabled by Policy Dissection, we train a goal-conditioned quadrupedal ANYmal robot controller with code provided by https://github.com/leggedrobotics/legged_gym. We build primitive-activation conditional control system on this controller with a PID controller determining the unit output according to the tracking error. As a result, it can track the target yaw command and can achieve the similar control precision, compared to explicitly indicating the goal in the network input. Video is available here.

The experiment script can be found at policydissect/scripts/run_tracking_experiment.py. The default yaw tracking is achieved by explicit goal-conditioned control, while running python policydissect/scripts/run_tracking_experiment.py --primitive_activation will change to primitive-activation conditional control.

Troubleshooting

Installing IsaacGym

If you encounter ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory, run this:

export LD_LIBRARY_PATH=/path/to/libpython/directory
# If you are using Conda, the path should be /path/to/conda/envs/your_env/lib.
# For example:
export LD_LIBRARY_PATH=/home/zhenghao/anaconda3/envs/policydissect/lib

If you encounter CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1., try this:

sudo apt-get install build-essential

If you encounter AttributeError: module 'distutils' has no attribute 'version' from tensorboard, try this:

pip install -U setuptools==50.0.0

Installing Mujoco

If you encounter: fatal error: GL/osmesa.h: No such file or directory:

sudo apt-get install libosmesa6-dev

If you encounter: error: [Errno 2] No such file or directory: 'patchelf': 'patchelf':

sudo apt-get install patchelf

If you encounter: ERROR: GLEW initalization error: Missing GL version:

sudo apt-get install -y libglew-dev

Reference

@inproceedings{
    li2022humanai,
    title={Human-{AI} Shared Control via Policy Dissection},
    author={Quanyi Li and Zhenghao Peng and Haibin Wu and Lan Feng and Bolei Zhou},
    booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
    year={2022},
    url={https://openreview.net/forum?id=LCOv-GVVDkp}
}