/mvp

Masked Visual Pre-training for Motor Control

Primary LanguagePython

Masked Visual Pre-training for Motor Control

This is a PyTorch implementation of the paper Masked Visual Pre-training for Motor Control. It contains the benchmark suite, pre-trained models, and the training code to reproduce the results from the paper.

Pre-trained visual enocoders

We provide pre-trained visual encoders used in the paper. The models are in the same format as mae and timm:

backbone objective data md5 download
ViT-S MAE in-the-wild fe6e30 model
ViT-S MAE ImageNet 29a004 model
ViT-S Supervised ImageNet f8f23b model

You can use our pre-trained encoders directly in your code (e.g., to extract image features) or use them with our benchmark suite and RL training code. We provde instructions for both use-cases next.

Using pre-trained encoders in your code

Install the python package:

pip install git+https://github.com/ir413/mvp

Import pre-trained encoders:

import mvp

model = mvp.load("vits-mae-hoi")
model.freeze()

Benchmark suite and RL training code

Please see INSTALL.md for installation instructions.

Example training commands

Train FrankaPick from states:

python tools/train.py task=FrankaPick

Train FrankaPick from pixels:

python tools/train.py task=FrankaPickPixels

Train on 8 GPUs:

python tools/train_dist.py num_gpus=8

Test a policy after N iterations:

python tools/train.py test=True headless=False logdir=/path/to/job resume=N

Citation

If you find the code or pre-trained models useful in your research, please use the following BibTeX entry:

@article{Xiao2022
  title = {Masked Visual Pre-training for Motor Control},
  author = {Tete Xiao and Ilija Radosavovic and Trevor Darrell and Jitendra Malik},
  journal = {arXiv:2203.06173},
  year = {2022}
}

Acknowledgments

We thank NVIDIA IsaacGym and PhysX teams for making the simulator and preview code examples available.