/MoCapAct

A Multi-Task Dataset for Simulated Humanoid Control

Primary LanguagePythonMIT LicenseMIT

MoCapAct

montage

Code License      Dataset License

Paper: MoCapAct: A Multi-Task Dataset for Simulated Humanoid Control

This is the codebase for the MoCapAct project, which uses motion capture (MoCap) clips to learn low-level motor skills for the "CMU Humanoid" from the dm_control package. This repo contains all code to:

  • train the clip snippet experts,
  • collect expert rollouts into a dataset,
  • download our experts and rollouts from the command line,
  • perform policy distillation,
  • perform reinforcement learning on downstream tasks, and
  • perform motion completion.

For more information on the project and to download the entire dataset, please visit the project website.

For users interested in development, we recommend reading the following documentation on dm_control:

Setup

MoCapAct requires Python 3.7+. We recommend that you use a virtual environment. For example, using conda:

conda create -n MoCapAct pip python==3.8
conda activate MoCapAct

To install the package, we recommend cloning the repo and installing the local copy:

git clone https://github.com/microsoft/MoCapAct.git
cd MoCapAct
pip install -e .

Dataset

The MoCapAct dataset consists of clip experts trained on the MoCap snippets and the rollouts from those experts.

We provide the dataset and models on the MoCapAct collection on Hugging Face. This collection consists of two pages:

  • A model zoo which contains the clip snippet experts, multiclip policies, RL-trained policies for the transfer tasks, and the GPT policy.
  • A dataset page which contains the small rollout dataset and large rollout dataset.

Description

Clip snippet experts We signify a clip snippet expert by the snippet it is tracking. Taking CMU_009_12-165-363 as an example expert, the file hierarchy for the snippet expert is:
CMU_009_12-165-363
├── clip_info.json         # Contains clip ID, start step, and end step
└── eval_rsi/model
    ├── best_model.zip     # Contains policy parameters and hyperparameters
    └── vecnormalize.pkl   # Used to get normalizer for observation and reward

The expert policy can be loaded using our repository:

from mocapact import observables
from mocapact.sb3 import utils
expert_path = "data/experts/CMU_009_12-165-363/eval_rsi/model"
expert = utils.load_policy(expert_path, observables.TIME_INDEX_OBSERVABLES)

from mocapact.envs import tracking
from dm_control.locomotion.tasks.reference_pose import types
dataset = types.ClipCollection(ids=['CMU_009_12'], start_steps=[165], end_steps=[363])
env = tracking.MocapTrackingGymEnv(dataset)
obs, done = env.reset(), False
while not done:
    action, _ = expert.predict(obs, deterministic=True)
    obs, rew, done, _ = env.step(action)
    print(rew)
Expert rollouts The expert rollouts consist of a collection of HDF5 files, one per clip. An HDF5 file contains expert rollouts for each constituent snippet as well as miscellaneous information and statistics. To facilitate efficient loading of the observations, we concatenate all the proprioceptive observations (joint angles, joint velocities, actuator activations, etc.) from an episode into a single numerical array and provide indices for the constituent observations in the observable_indices group.

Taking CMU_009_12.hdf5 (which contains three snippets) as an example, we have the following HDF5 hierarchy:

CMU_009_12.hdf5
├── n_rsi_rollouts                # R, number of rollouts from random time steps in snippet
├── n_start_rollouts              # S, number of rollouts from start of snippet
├── ref_steps                     # Indices of MoCap reference relative to current time step. Here, (1, 2, 3, 4, 5).
├── observable_indices
│   └── walker
│       ├── actuator_activation   # (0, 1, ..., 54, 55)
│       ├── appendages_pos        # (56, 57, ..., 69, 70)
│       ├── body_height           # (71)
│       ├── ...
│       └── world_zaxis           # (2865, 2866, 2867)
│
├── stats                         # Statistics computed over the entire dataset
│   ├── act_mean                  # Mean of the experts' sampled actions
│   ├── act_var                   # Variance of the experts' sampled actions
│   ├── mean_act_mean             # Mean of the experts' mean actions
│   ├── mean_act_var              # Variance of the experts' mean actions
│   ├── proprio_mean              # Mean of the proprioceptive observations
│   ├── proprio_var               # Variance of the proprioceptive observations
│   └── count                     # Number of observations in dataset
│
├── CMU_009_12-0-198              # Rollouts for the snippet CMU_009_12-0-198
├── CMU_009_12-165-363            # Rollouts for the snippet CMU_009_12-165-363
└── CMU_009_12-330-529            # Rollouts for the snippet CMU_009_12-330-529

Each snippet group contains $R+S$ snippets. The first $S$ episodes correspond to episodes initialized from the start of the snippet and the last $R$ episodes to episodes initialized at random points in the snippet. We now uncollapse the CMU_009_12-165-363 group within the HDF5 file to reveal the rollout structure:

CMU_009_12-165-363
├── early_termination                  # (R+S)-boolean array indicating which episodes terminated early
├── rsi_metrics                        # Metrics for episodes that initialize at random points in snippet
│   ├── episode_returns                # R-array of episode returns
│   ├── episode_lengths                # R-array of episode lengths
│   ├── norm_episode_returns           # R-array of normalized episode rewards
│   └── norm_episode_lengths           # R-array of normalized episode lengths
├── start_metrics                      # Metrics for episodes that initialize at start in snippet
│
├── 0                                  # First episode, of length T
│   ├── observations
│   │   ├── proprioceptive             # (T+1)-array of proprioceptive observations
│   │   ├── walker/body_camera         # (T+1)-array of images from body camera **(not included)**
│   │   └── walker/egocentric_camera   # (T+1)-array of images from egocentric camera **(not included)**
│   ├── actions                        # T-array of sampled actions executed in environment
│   ├── mean_actions                   # T-array of corresponding mean actions
│   ├── rewards                        # T-array of rewards from environment
│   ├── values                         # T-array computed using the policy's value network
│   └── advantages                     # T-array computed using generalized advantage estimation
│
├── 1                                  # Second episode
├── ...
└── R+S-1                              # (R+S)th episode

To keep the dataset size manageable, we do not include image observations in the dataset. The camera images can be logged by providing the flags --log_all_proprios --log_cameras to the mocapact/distillation/rollout_experts.py script.

The HDF5 rollouts can be read and utilized in Python:

import h5py
dset = h5py.File("data/small_dataset/CMU_009_12.hdf5", "r")
print("Expert actions from first rollout episode of second snippet:")
print(dset["CMU_009_12-165-363/0/actions"][...])

We provide a "large" dataset where $R = S = 100$ (with size 600 GB) and a "small" dataset where $R = S = 10$ (with size 50 GB).

Examples

Below are Python commands we used for our paper.

Clip snippet experts

Training a clip snippet expert:

python -m mocapact.clip_expert.train \
  --clip_id [CLIP_ID] `# e.g., CMU_016_22` \
  --start_step [START_STEP] `# e.g., 0` \
  --max_steps [MAX_STEPS] `# e.g., 210 (can be larger than clip length)` \
  --n_workers [N_CPU] `# e.g., 8` \
  --log_root experts \
  $(cat cfg/clip_expert/train.txt)

Evaluating a clip snippet expert (numerical evaluation and visual evaluation):

python -m mocapact.clip_expert.evaluate \
  --policy_root [POLICY_ROOT] `# e.g., experts/CMU_016-22-0-82/0/eval_rsi/model` \
  --n_workers [N_CPU] `# e.g., 8` \
  --n_eval_episodes 1000 `# set to 0 to just run the visualizer` \
  $(cat cfg/clip_expert/evaluate.txt)

We can also load the experts in Python:

from mocapact import observables
from mocapact.sb3 import utils
expert_path = "experts/CMU_016_22-0-82/0/eval_rsi/model" # example path
expert = utils.load_policy(expert_path, observables.TIME_INDEX_OBSERVABLES)

from mocapact.envs import tracking
from dm_control.locomotion.tasks.reference_pose import types
dataset = types.ClipCollection(ids=['CMU_016_22'])
env = tracking.MocapTrackingGymEnv(dataset)
obs, done = env.reset(), False
while not done:
    action, _ = expert.predict(obs, deterministic=True)
    obs, rew, done, _ = env.step(action)
    print(rew)
Creating rollout dataset

Rolling out a collection of experts and collecting into a dataset:

python -m mocapact.distillation.rollout_experts \
  --input_dirs [EXPERT_ROOT] `# e.g., experts` \
  --n_workers [N_CPU] `# e.g., 8` \
  --device [DEVICE] `# e.g., cuda` \
  --output_path dataset/file_name_ignored.hdf5 \
  $(cat cfg/rollout.txt)

This will result in a collection of HDF5 files (one per clip), which can be read and utilized in Python:

import h5py
dset = h5py.File("dataset/CMU_016_22.hdf5", "r")
print("Expert actions from first rollout episode:")
print(dset["CMU_016_22-0-82/0/actions"][...])
Multi-clip policy

Training a multi-clip policy on the entire MoCapAct dataset:

source scripts/get_all_clips.sh [PATH_TO_DATASET]
python -m mocapact.distillation.train \
  --train_dataset_paths $train \
  --val_dataset_paths $val \
  --dataset_metrics_path $metrics \
  --extra_clips $clips \
  --output_root multi_clip/all \
  --gpus 0 `# indices of GPUs` \
  $(cat cfg/multi_clip/train.txt) \
  $(cat cfg/multi_clip/rwr.txt) `# rwr can be replaced with awr, cwr, or bc` \
  --model.config.embed_size 60 \
  --eval.n_workers [N_CPU] `# e.g., 16`

Training a multi-clip policy on the locomotion subset of the MoCapAct dataset:

source scripts/get_locomotion_clips.sh [PATH_TO_DATASET]
python -m mocapact.distillation.train \
  --train_dataset_paths $train \
  --dataset_metrics_path $metrics \
  --extra_clips $clips \
  --output_root multi_clip/locomotion \
  --gpus 0 `# indices of GPUs` \
  $(cat cfg/multi_clip/train.txt) \
  $(cat cfg/multi_clip/rwr.txt) `# rwr can be replaced with awr, cwr, or bc` \
  --model.config.embed_size 20 \
  --eval.n_workers [N_CPU] `# e.g., 16`

Evaluating a multi-clip policy on all the snippets within the MoCapAct dataset (numerical evaluation and visual evaluation):

source scripts/get_all_clips.sh [PATH_TO_DATASET]
python -m mocapact.distillation.evaluate \
  --policy_path [POLICY_PATH] `# e.g., multi_clip/all/eval/train_rsi/best_model.ckpt` \
  --clip_snippets $snippets \
  --n_workers [N_CPU] `# e.g., 8` \
  --device [DEVICE] `# e.g., cuda` \
  --n_eval_episodes 1000 `# set to 0 to just run the visualizer` \
  $(cat cfg/multi_clip/evaluate.txt)

The multi-clip policy can be loaded using PyTorch Lightning's functionality to interact with the environment:

from mocapact.distillation import model
model_path = "multi_clip/all/eval/train_rsi/best_model.ckpt"
policy = model.NpmpPolicy.load_from_checkpoint(model_path, map_location="cpu")

from mocapact.envs import tracking
from dm_control.locomotion.tasks.reference_pose import cmu_subsets
dataset = cmu_subsets.ALL
ref_steps = (1, 2, 3, 4, 5)
env = tracking.MocapTrackingGymEnv(dataset, ref_steps)
obs, done = env.reset(), False
embed = policy.initial_state(deterministic=False)
while not done:
    action, embed = expert.predict(obs, state=embed, deterministic=False)
    obs, rew, done, _ = env.step(action)
    print(rew)
RL transfer tasks

Training a task policy (here, with a pre-defined low-level policy):

python -m mocapact.transfer.train \
  --log_root [LOG_ROOT] `# e.g., transfer/go_to_target` \
  $(cat cfg/transfer/train.txt) \
  $(cat cfg/transfer/go_to_target.txt) `# set to cfg/transfer/velocity_control.txt for velocity control` \
  $(cat cfg/transfer/with_low_level.txt) `# set to cfg/transfer/no_low_level.txt for no low-level policy`

Evaluating a task policy:

python -m mocapact.transfer.evaluate \
  --model_root [MODEL_ROOT] `# e.g., transfer/go_to_target/0/eval/model` \
  --task [TASK] `# e.g., mocapact/transfer/config.py:go_to_target or velocity_control`
Motion completion

Training a GPT policy on the entire MoCapAct dataset:

source scripts/get_all_clips.sh [PATH_TO_DATASET]
python -m mocapact.distillation.train \
  --train_dataset_paths $train \
  --val_dataset_paths $val \
  --dataset_metrics_path $metrics \
  --output_root motion_completion \
  $(cat cfg/motion_completion/train.txt)

Performing motion completion with a trained GPT policy:

python -m mocapact.distillation.motion_completion \
  --policy_path [POLICY_PATH] `# e.g., motion_completion/model/last.ckpt` \
  --expert_root [EXPERT_ROOT] `# e.g., experts` \
  --clip_snippet [CLIP_SNIPPET] `# e.g., CMU_016_22` \
  --n_workers [N_CPU] `# e.g., 8` \
  --device [DEVICE] `# e.g., cuda` \
  --n_eval_episodes 100 `# Set to 0 to just run the visualizer` \
  $(cat cfg/motion_completion/evaluate.txt)

To generate a prompt, we also input a path to the directory of snippet experts. Alternatively, you can pass a path to a multi-clip policy through --distillation_path, though it will likely produce lower-quality prompts than the snippet experts.

Dataset Interface

We provide two datasets in this repo. The ExpertDataset is used to perform imitation learning, e.g., to train a multi-clip tracking policy or a GPT policy for motion completion. The D4RLDataset is used for offline reinforcement learning. For small enough instantiations of the datasets that fit into memory, the user can use D4RLDataset.get_in_memory_rollouts to load a batch of transitions into memory. For instantiations that do not fit into memory (e.g., the entire MoCapAct dataset), the user can use the dataset as a PyTorch Dataset by using an iterator over the transitions obtained by using __getitem__.

Example Projects That Use MoCapAct

  • H-GAP: Humanoid Control with a Generalist Planner, by Zhengyao Jiang, Yingchen Xu, et al. (Paper, Website, Code)
  • Leveraging Demonstrations with Latent Space Priors, by Jonas Gehring et al. (Paper, Website, Code)
  • Hierarchical World Models as Visual Whole-Body Humanoid Controllers, by Nicklas Hansen et al. (Paper, Website, Code)
  • Body Transformer: Leveraging Robot Embodiment for Policy Learning, by Carmelo Sferrazza et al. (Paper, Website, Code)

Citation

If you reference or use MoCapAct in your research, please cite:

@inproceedings{wagener2022mocapact,
  title={{MoCapAct}: A Multi-Task Dataset for Simulated Humanoid Control},
  author={Wagener, Nolan and Kolobov, Andrey and Frujeri, Felipe Vieira and Loynd, Ricky and Cheng, Ching-An and Hausknecht, Matthew},
  booktitle={Advances in Neural Information Processing Systems},
  volume={35},
  pages={35418--35431},
  year={2022}
}

Licenses

The code is licensed under the MIT License. The dataset is licensed under the CDLA Permissive v2 License.