Website | Technical Paper | Videos
This repository contains example RL environments for the NVIDIA Isaac Gym high performance environments described in our NeurIPS 2021 Datasets and Benchmarks paper
Download the Isaac Gym Preview 3 release from the website, then follow the installation instructions in the documentation. We highly recommend using a conda environment to simplify set up.
Ensure that Isaac Gym works on your system by running one of the examples from the python/examples
directory, like joint_monkey.py
. Follow troubleshooting steps described in the Isaac Gym Preview 3
install instructions if you have any trouble running the samples.
Once Isaac Gym is installed and samples work within your current python environment, install this repo:
pip install -e .
To train your first policy, run this line:
python train.py task=Cartpole
Cartpole should train to the point that the pole stays upright within a few seconds of starting.
Here's another example - Ant locomotion:
python train.py task=Ant
Note that by default we show a preview window, which will usually slow down training. You
can use the v
key while running to disable viewer updates and allow training to proceed
faster. Hit the v
key again to resume viewing after a few seconds of training, once the
ants have learned to run a bit better.
Use the esc
key or close the viewer window to stop training early.
Alternatively, you can train headlessly, as follows:
python train.py task=Ant headless=True
Ant may take a minute or two to train a policy you can run. When running headlessly, you can stop it early using Control-C in the command line window.
Checkpoints are saved in the folder runs/EXPERIMENT_NAME/nn
where EXPERIMENT_NAME
defaults to the task name, but can also be overridden via the experiment
argument.
To load a trained checkpoint and continue training, use the checkpoint
argument:
python train.py task=Ant checkpoint=runs/Ant/nn/Ant.pth
To load a trained checkpoint and only perform inference (no training), pass test=True
as an argument, along with the checkpoint name. To avoid rendering overhead, you may
also want to run with fewer environments using num_envs=64
:
python train.py task=Ant checkpoint=runs/Ant/nn/Ant.pth test=True num_envs=64
Note that If there are special characters such as [
or =
in the checkpoint names,
you will need to escape them and put quotes around the string. For example,
checkpoint="./runs/Ant/nn/last_Antep\=501rew\[5981.31\].pth"
We use Hydra to manage the config. Note that this has some differences from previous incarnations in older versions of Isaac Gym.
Key arguments to the train.py
script are:
task=TASK
- selects which task to use. Any ofAllegroHand
,Ant
,Anymal
,AnymalTerrain
,BallBalance
,Cartpole
,FrankaCabinet
,Humanoid
,Ingenuity
Quadcopter
,ShadowHand
,ShadowHandOpenAI_FF
,ShadowHandOpenAI_LSTM
, andTrifinger
(these correspond to the config for each environment in the folderisaacgymenvs/config/task
)train=TRAIN
- selects which training config to use. Will automatically default to the correct config for the environment (ie.<TASK>PPO
).num_envs=NUM_ENVS
- selects the number of environments to use (overriding the default number of environments set in the task config).seed=SEED
- sets a seed value for randomizations, and overrides the default seed set up in the task configsim_device=SIM_DEVICE_TYPE
- Device used for physics simulation. Set tocuda:0
(default) to use GPU and tocpu
for CPU. Follows PyTorch-like device syntax.rl_device=RL_DEVICE
- Which device / ID to use for the RL algorithm. Defaults tocuda:0
, and also follows PyTorch-like device syntax.graphics_device_id=GRAHPICS_DEVICE_ID
- Which Vulkan graphics device ID to use for rendering. Defaults to 0. Note - this may be different from CUDA device ID, and does not follow PyTorch-like device syntax.pipeline=PIPELINE
- Which API pipeline to use. Defaults togpu
, can also set tocpu
. When using thegpu
pipeline, all data stays on the GPU and everything runs as fast as possible. When using thecpu
pipeline, simulation can run on either CPU or GPU, depending on thesim_device
setting, but a copy of the data is always made on the CPU at every step.test=TEST
- If set toTrue
, only runs inference on the policy and does not do any training.checkpoint=CHECKPOINT_PATH
- Set to path to the checkpoint to load for training or testing.headless=HEADLESS
- Whether to run in headless mode.experiment=EXPERIMENT
- Sets the name of the experiment.max_iterations=MAX_ITERATIONS
- Sets how many iterations to run for. Reasonable defaults are provided for the provided environments.
Hydra also allows setting variables inside config files directly as command line arguments. As an example, to set the discount rate for a rl_games training run, you can use train.params.config.gamma=0.999
. Similarly, variables in task configs can also be set. For example, task.env.enableDebugVis=True
.
Default values for each of these are found in the isaacgymenvs/config/config.yaml
file.
The way that the task
and train
portions of the config works are through the use of config groups.
You can learn more about how these work here
The actual configs for task
are in isaacgymenvs/config/task/<TASK>.yaml
and for train in isaacgymenvs/config/train/<TASK>PPO.yaml
.
In some places in the config you will find other variables referenced (for example,
num_actors: ${....task.env.numEnvs}
). Each .
represents going one level up in the config hierarchy.
This is documented fully here.
Source code for tasks can be found in isaacgymenvs/tasks
.
Each task subclasses the VecEnv
base class in isaacgymenvs/base/vec_task.py
.
Refer to docs/framework.md for how to create your own tasks.
Full details on each of the tasks available can be found in the RL examples documentation.
IsaacGymEnvs includes a framework for Domain Randomization to improve Sim-to-Real transfer of trained RL policies. You can read more about it here.
If deterministic training of RL policies is important for your work, you may wish to review our Reproducibility and Determinism Documentation.
Please review the Isaac Gym installation instructions first if you run into any issues.
You can either submit issues through GitHub or through the Isaac Gym forum here.
Please cite this work as:
@misc{makoviychuk2021isaac,
title={Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning},
author={Viktor Makoviychuk and Lukasz Wawrzyniak and Yunrong Guo and Michelle Lu and Kier Storey and Miles Macklin and David Hoeller and Nikita Rudin and Arthur Allshire and Ankur Handa and Gavriel State},
year={2021},
journal={arXiv preprint arXiv:2108.10470}
}
Note if you use the ANYmal rough terrain environment in your work, please ensure you cite the following work:
@misc{rudin2021learning,
title={Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning},
author={Nikita Rudin and David Hoeller and Philipp Reist and Marco Hutter},
year={2021},
journal = {arXiv preprint arXiv:2109.11978}
}
If you use the Trifinger environment in your work, please ensure you cite the following work:
@misc{isaacgym-trifinger,
title = {{Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFinger}},
author = {Allshire, Arthur and Mittal, Mayank and Lodaya, Varun and Makoviychuk, Viktor and Makoviichuk, Denys and Widmaier, Felix and Wuthrich, Manuel and Bauer, Stefan and Handa, Ankur and Garg, Animesh},
year = {2021},
journal = {arXiv preprint arXiv:2108.09779}
}
If you use the AMP: Adversarial Motion Priors environment in your work, please ensure you cite the following work:
@article{
2021-TOG-AMP,
author = {Peng, Xue Bin and Ma, Ze and Abbeel, Pieter and Levine, Sergey and Kanazawa, Angjoo},
title = {AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control},
journal = {ACM Trans. Graph.},
issue_date = {August 2021},
volume = {40},
number = {4},
month = jul,
year = {2021},
articleno = {1},
numpages = {15},
url = {http://doi.acm.org/10.1145/3450626.3459670},
doi = {10.1145/3450626.3459670},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {motion control, physics-based character animation, reinforcement learning},
}