/vip

Official repository for "VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training"

Primary LanguagePythonOtherNOASSERTION

VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training (ICLR 2023, Spotlight)

Jason Yecheng Ma12, Shagun Sodhani1 Dinesh Jayaraman2, Osbert Bastani2, {Vikash Kumar*1, Amy Zhang*1}

1Meta AI, 2University of Pennsylvania

This is the official repository for VIP, a self-supervised zero-shot visual reward and representation for downstream unseen robot tasks. This repository contains examples for using the pre-trained VIP model as well as training VIP from scratch using any custom video dataset.

Installation

Create a conda environment where the packages will be installed.

conda create --name vip python=3.9
conda activate vip

Then, in the root directory of this repository, run:

pip install -e . 

VIP Usage Examples

To load the VIP model pre-trained on Ego4D, simply do:

from vip import load_vip
vip = load_vip()
vip.eval()

Example code to use the released VIP representation is located here.

We have also included an example for generating embedding distance curves as in our paper using our real-robot demonstrations. You can try it here:

cd vip/examples
python plot_reward_curves.py

This should generate the following plots in vip/examples/embedding_curves/:

We also include an example for generating animated embedding distance curves for VIP and other models on robot videos from three different domains. You can try it here:

cd vip/examples
python plot_reward_curves_video.py

This should generate the following plots (and more!) in vip/examples/embedding_curves/:

You can easily visualize VIP rewards on your own video by just replacing the video path in the example code!

In addition to this official repository, VIP has also been incorporated into TorchRL as an out-of-box visual representation for any Gym environment. After you install TorchRL, using VIP is as simple as:

from torchrl.envs.transforms import VIPTransform
env = TransformedEnv(my_env, VIPTransform(keys_in=["next_pixels"], download=True)

Training VIP Representation

Our codebase supports training VIP on both the Ego4D dataset that was used in pre-training our released VIP model as well as any custom video dataset. The video dataset directory should use the following structure:

my_dataset_path/
    video0/
        0.png
        1.png
        ...
    video1/
    video2/
    ...

Then, you can train VIP on your dataset by running:

python train_vip.py --config-name=config_vip dataset=my_dataset_name datapath=my_dataset_path

For Ego4D or equivalent large-scale pre-training, we suggest using config config_vip_ego4d.yaml (the config for the released VIP model):

python train_vip.py --config-name=config_vip_ego4d dataset=ego4d datapath=ego4d_dataset_path

License

The source code in this repository is licensed under the CC BY-NC 4.0 License.

Citation

If you find this repository or paper useful for your research, please cite

@article{ma2022vip,
  title={VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training},
  author={Ma, Yecheng Jason and Sodhani, Shagun and Jayaraman, Dinesh and Bastani, Osbert and Kumar, Vikash and Zhang, Amy},
  journal={arXiv preprint arXiv:2210.00030},
  year={2022}
}

Ackowledgements

Parts of this code are adapted from the R3M codebase.