Official implementations of
Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers (LocoTransformer)
Ruihan Yang*, Minghao Zhang*, Nicklas Hansen, Huazhe Xu, Xiaolong Wang
[Arxiv] [Webpage] [ICLR Paper]
and
Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization (MMDR)
Chieko Sarah Imai*, Minghao Zhang*, Yuchen Zhang*, Marcin Kierebiński, Ruihan Yang, Yuzhe Qin, Xiaolong Wang
Our repository contains the necessary functions to train the policy in simulation and deploy the learned policy on the real robot in the real world. Directly testes in the real robot, we can see the A1 robot traverses in versatile real-world scenarios:
With the depth maps as policy input, RL agents can learn to navigate through our simulated environments. We mainly use the following scenarios (tasks in LocoTransformer):
The following visualization results show the attention mechanism of the LocoTransformer at the different regions of the image:
In MMDR, we also simulate multiple sim-to-real gap. For example, we simulate the "blinding spot" in the RealSense when sampling depth images:
We also provide the functions to simulate the multi-modal latencies.
Please refer to our paper for more details.
We assume that you have access to a GPU with CUDA >=9.2 support. All dependencies can then be installed with the following commands:
pip install -e .
We use config files in folder config
to configure the parameters of training and enviornment.
In config/rl/
, there are three types of config files:
static
: Train in basic plane ground or uneven terrain with static obstacles.moving
: Train in basic plane ground or uneven terrain with moving obstacles.challenge
: Some challenging scenarios like mountain, hill, and so on.
In each folder, we have subfolders for different algorithms:
naive_baseline
: Train a naive baseline policy. No frame-extraction, no delay randomization.frame_extract4
: Train a policy with frame-extraction (k=4 in the MMDR paper). No delay randomization.frame_extract4_fixed_delay
: Train a policy with frame-extraction (k=4 in the MMDR paper) and fixed delay in all episodes.frame_extract4_random_delay
: Train a policy with frame-extraction (k=4 in the MMDR paper) and random delay in each episodes.locotransformer
: Train a LocoTransformer policy.locotransformer_random_delay
: Train a LocoTransformer policy with random delay in each episodes.
In challeging scenarios, we only put the baseline
and locotransformer
folder as config files, since we didn't conduct MMDR experiements on these scenarios.
While RL can navigate from proprioception and vision by outputing the joint angles directly, we also provide a visual-MPC training to output the control command (linear and angular velocity). See the comparison in Locotransformer paper.
To reproduce, we use the following config folders:
mpc
: Train a visual-MPC policy with both proprioception and vision as input.mpc_vision_only
: Train a visual-MPC policy with only vision as input.
Besides, You can also configurate the network architecture in config/
. We only use PPO for all the experiments, so we didn't put the configuration of other algorithms. But users can still use them in torchrl
library.
The starter
directory contains training and evaluation scripts for all the included algorithms. The config
directory contains training configuration files for all the experiments. You can use the python scripts, e.g. for training call
python starter/ppo_locotransformer.py \
--config config/rl/static/locotransformer/thin-goal.json \
--seed 0 \
--log_dir {YOUR_LOG_DIR} \
--id {YOUR_ID}
to run PPO+LocoTransformer on the environment, thin-goal
. And you can use
python starter/locotransformer_viewer.py \
--seed 0 \
--log_dir {YOUR_LOG_DIR} \
--id {YOUR_ID} \
--env_name A1MoveGround
to test the trained model on the same environment.
Since our robot interface based on Motion Imitation. We provide the simplified interface setup instruction here. For detailed version please check Motion Imitation
build the python interface by running the following:
cd third_party/unitree_legged_sdk
mkdir build
cd build
cmake ..
make
Then copy the built robot_interface.XXX.so
file to the main directory (where you can see this README.md file).
To deploy visual-policy, we should also set up Intel RealSense interface. Please check detailed instruction on Librealsense
# Current
bash a1_hardware/convert_tensor_rt/convert_trt.sh $EXP_ID $SEED $LOG_ROOT_PATH
# For Tensor RT Version
python a1_hardware/execute_locotransformer_trt.py
# For Pytorch Version
python a1_hardware/execute_locotransformer.py
We use joint control for A1, the default action and action scale are predefined. Normalization information for depth input is also predefined.
Some system path and configuration may varies, please modify accordingly.
See LocoTransformer and MMDR for results.
If you find our code useful in your research, please consider citing our work as follows:
for LocoTransformer:
@inproceedings{
yang2022learning,
title={Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers},
author={Ruihan Yang and Minghao Zhang and Nicklas Hansen and Huazhe Xu and Xiaolong Wang},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=nhnJ3oo6AB}
}
for MMDR:
@inproceedings{Imai2021VisionGuidedQL,
title={Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization},
author={Chieko Imai and Minghao Zhang and Yuchen Zhang and Marcin Kierebinski and Ruihan Yang and Yuzhe Qin and Xiaolong Wang},
booktitle={2022 IEEE/RSJ international conference on intelligent robots and systems (IROS)},
}
This repository is a product of our work on LocoTransformer and MMDR. Our RL implementation is based on TorchRL, and the environment implementation is based on google motion-imitation. The python interface for real robot is also based on google motion-imitation.