This is a repository containing the code for the paper:
For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal. ICML 2023 Yingdong Hu, Renhao Wang, Li Erran Li, and Yang Gao
- Install the following libraries
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3
- Set up Environment
conda env create -f conda_env.yml
conda activate pvm
- Install PyTorch, torchvision and timm following official instructions. For example:
conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install timm==0.4.5
- Install MuJoCo version 2.1 and mujoco-py
- Please follow the instructions in the mujoco-py package.
- You should make sure that the GPU version of mujoco-py gets built, so that image rendering is fast. An easy way to ensure this is to clone the mujoco-py repository, change this line to
Builder = LinuxGPUExtensionBuilder
, and install from source by runningpip install -e .
in themujoco-py
root directory. You can also download our changed mujoco-py package and install from source.
- Install Meta-World
Download the package from here.
pip install -e /path/to/dir/metaworld
- Install Robosuite
We use the offline_study
branch of Robosuite, dowload it from here.
pip install -e /path/to/dir/robosuite-offline_study
- Install Franka-Kitchen
Please follow the instructions in the R3M repository. Unilke R3M, we only randomize the pose of the robot arm between episodes but not the kitchen. So be be sure to add the line
FIXED_ENTRY_POINT = RANDOM_ENTRY_POINT
here https://github.com/vikashplus/mj_envs/blob/stable/mj_envs/envs/relay_kitchen/__init__.py#L160. Note that we use RANDOM_ENTRY_POINT
instead of RANDOM_DESK_ENTRY_POINT
.
Model | Architecture | Highlights | Link |
---|---|---|---|
MoCo v2 | ResNet-50 | Contrastive learning, momentum encoder | download |
SwAV | ResNet-50 | Contrast online cluster assignments | download |
SimSiam | ResNet-50 | Without negative pairs | download |
DenseCL | ResNet-50 | Dense contrastive learning, learn local features | download |
PixPro | ResNet-50 | Pixel-level pretext task, learn local features | download |
VICRegL | ResNet-50 | Learn global and local features | download |
VFS | ResNet-50 | Encode temporal dynamics | download |
R3M | ResNet-50 | Learn visual representations for robotics | download |
VIP | ResNet-50 | Learn representations and reward for robotics | download |
MoCo v3 | ViT-B/16 | Contrastive learning for ViT | download |
DINO | ViT-B/16 | Self-distillation with no labels | download |
MAE | ViT-B/16 | Masked image modeling (MIM) | download |
iBOT | ViT-B/16 | Combine self-distillation with MIM | download |
CLIP | ViT-B/16 | Language-supervised pre-training | download |
After downloading a pre-trained vision model, place it under PVM-Robotics/pretrained/
folder. Please don't modify the file names of these checkpoints.
- Download the expert demonstrations for all tasks from here.
- Unzip
expert_demos.zip
and place theexpert_demos
directory intoPVM-Robotics/expert_demos
. - set the
path/to/dir
portion of theroot_dir
path variable incfgs/config.yaml
to the path of the PVM-Robotics repository.
python train_rl.py \
agent=drqv2 \
suite=metaworld \
suite/metaworld_task=hammer \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
replay_buffer_size=500000 suite.num_seed_frames=4000 batch_size=512 \
use_wandb=true seed=1 exp_prefix=RL
suite/metaworld_task
can be set tohammer
,drawer_close
,door_open
,bin_picking
,button_press_topdown
,window_close
,lever_pull
, andcoffee_pull
.- When
agent.backbone
is set toresnet
,agent.embedding_name
can be set tomocov2-resnet50
,simsiam-resnet50
,swav-resnet50
,densecl-resnet50
,pixpro-resnet50
,vicregl-resnet50
,vfs-resnet50
,r3m-resnet50
, andvip-resnet50_VIPfc
. - When
agent.backbone
is set tovit
,agent.embedding_name
can be set tomocov3-vit-b16
,dino-vit-b16
,ibot-vit-b16
,clip-vit-b16
, andmae-vit-b16
.
python train_rl.py \
agent=drqv2 \
suite=robosuite \
suite/robosuite_task=panda_door \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
replay_buffer_size=500000 suite.num_seed_frames=4000 batch_size=512 \
use_wandb=true seed=1 exp_prefix=RL
suite/robosuite_task
can be set topanda_door
,panda_lift
,panda_twoarm_peginhole
,panda_pickplace_can
,panda_nut_assembly_square
,jaco_door
,jaco_lift
, andjaco_twoarm_peginhole
.
python train_rl.py \
agent=drqv2 \
suite=kitchen \
suite/kitchen_task=turn_knob \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
num_train_frames_drq=1100000 replay_buffer_size=500000 suite.num_seed_frames=4000 batch_size=512 \
use_wandb=true seed=1 exp_prefix=RL
suite/kitchen_task
can be set toturn_knob
,turn_light_on
,slide_door
,open_door
, andopen_micro
.- We train RL agents for 1.1M environment steps on Franka-Kitchen.
python train_bc.py \
agent=bc \
suite=metaworld \
suite/metaworld_task=hammer \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
num_demos=25 \
use_wandb=true seed=1 exp_prefix=BC
- For Meta-World, the maximum value of
num_demos
is 25.
python train_bc.py \
agent=bc \
suite=robosuite \
suite/robosuite_task=panda_door \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
num_demos=50 \
use_wandb=true seed=1 exp_prefix=BC
- For Robosuite, the maximum value of
num_demos
is 50.
python train_bc.py \
agent=bc \
suite=kitchen \
suite/kitchen_task=turn_knob \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
num_demos=25 \
use_wandb=true seed=1 exp_prefix=BC
- For Franka-Kitchen, the maximum value of
num_demos
is 25.
python train_vrf.py \
agent=potil \
suite=metaworld \
suite/metaworld_task=hammer \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
bc_regularize=true num_demos=1 \
use_wandb=true seed=1 exp_prefix=VRF
python train_vrf.py \
agent=potil \
suite=robosuite \
suite/robosuite_task=panda_door \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
bc_regularize=true num_demos=1 \
use_wandb=true seed=1 exp_prefix=VRF
python train_vrf.py \
agent=potil \
suite=kitchen \
suite/kitchen_task=turn_knob \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
bc_regularize=true num_demos=1 \
use_wandb=true seed=1 exp_prefix=VRF
We have modified and integrated the code from ROT and DrQ-v2 into this project.
If you find this repository useful, please consider giving a star ⭐ and citation:
@article{hu2023pre,
title={For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal},
author={Hu, Yingdong and Wang, Renhao and Li, Li Erran and Gao, Yang},
journal={arXiv preprint arXiv:2304.04591},
year={2023}
}