This is a PyTorch implementation of the paper Masked Visual Pre-training for Motor Control. It contains the benchmark suite, pre-trained models, and the training code to reproduce the results from the paper.
We provide pre-trained visual encoders used in the paper. The models are in the same format as mae and timm:
backbone | objective | data | md5 | download |
---|---|---|---|---|
ViT-S | MAE | in-the-wild | fe6e30 | model |
ViT-S | MAE | ImageNet | 29a004 | model |
ViT-S | Supervised | ImageNet | f8f23b | model |
You can use our pre-trained encoders directly in your code (e.g., to extract image features) or use them with our benchmark suite and RL training code. We provde instructions for both use-cases next.
Install the python package:
pip install git+https://github.com/ir413/mvp
Import pre-trained encoders:
import mvp
model = mvp.load("vits-mae-hoi")
model.freeze()
Please see INSTALL.md
for installation instructions.
Train FrankaPick
from states:
python tools/train.py task=FrankaPick
Train FrankaPick
from pixels:
python tools/train.py task=FrankaPickPixels
Train on 8 GPUs:
python tools/train_dist.py num_gpus=8
Test a policy after N iterations:
python tools/train.py test=True headless=False logdir=/path/to/job resume=N
If you find the code or pre-trained models useful in your research, please use the following BibTeX entry:
@article{Xiao2022
title = {Masked Visual Pre-training for Motor Control},
author = {Tete Xiao and Ilija Radosavovic and Trevor Darrell and Jitendra Malik},
journal = {arXiv:2203.06173},
year = {2022}
}
We thank NVIDIA IsaacGym and PhysX teams for making the simulator and preview code examples available.