For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

This is a repository containing the code for the paper:

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal. ICML 2023 Yingdong Hu, Renhao Wang, Li Erran Li, and Yang Gao

Installation

Dependency Setup

Install the following libraries

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3

Set up Environment

conda env create -f conda_env.yml
conda activate pvm

Install PyTorch, torchvision and timm following official instructions. For example:

conda install pytorch==1.12.1 torchvision==0.13.1 cudatoolkit=11.6 -c pytorch -c conda-forge
pip install timm==0.4.5

Install MuJoCo version 2.1 and mujoco-py

Please follow the instructions in the mujoco-py package.
You should make sure that the GPU version of mujoco-py gets built, so that image rendering is fast. An easy way to ensure this is to clone the mujoco-py repository, change this line to Builder = LinuxGPUExtensionBuilder, and install from source by running pip install -e . in the mujoco-py root directory. You can also download our changed mujoco-py package and install from source.

Install Meta-World

Download the package from here.

pip install -e /path/to/dir/metaworld

Install Robosuite

We use the offline_study branch of Robosuite, dowload it from here.

pip install -e /path/to/dir/robosuite-offline_study

Install Franka-Kitchen

Please follow the instructions in the R3M repository. Unilke R3M, we only randomize the pose of the robot arm between episodes but not the kitchen. So be be sure to add the line

FIXED_ENTRY_POINT = RANDOM_ENTRY_POINT

here https://github.com/vikashplus/mj_envs/blob/stable/mj_envs/envs/relay_kitchen/__init__.py#L160. Note that we use RANDOM_ENTRY_POINT instead of RANDOM_DESK_ENTRY_POINT.

Download Pre-Trained Vision Models

Model	Architecture	Highlights	Link
MoCo v2	ResNet-50	Contrastive learning, momentum encoder	download
SwAV	ResNet-50	Contrast online cluster assignments	download
SimSiam	ResNet-50	Without negative pairs	download
DenseCL	ResNet-50	Dense contrastive learning, learn local features	download
PixPro	ResNet-50	Pixel-level pretext task, learn local features	download
VICRegL	ResNet-50	Learn global and local features	download
VFS	ResNet-50	Encode temporal dynamics	download
R3M	ResNet-50	Learn visual representations for robotics	download
VIP	ResNet-50	Learn representations and reward for robotics	download
MoCo v3	ViT-B/16	Contrastive learning for ViT	download
DINO	ViT-B/16	Self-distillation with no labels	download
MAE	ViT-B/16	Masked image modeling (MIM)	download
iBOT	ViT-B/16	Combine self-distillation with MIM	download
CLIP	ViT-B/16	Language-supervised pre-training	download

After downloading a pre-trained vision model, place it under PVM-Robotics/pretrained/ folder. Please don't modify the file names of these checkpoints.

Download Expert Demonstrations

Download the expert demonstrations for all tasks from here.
Unzip expert_demos.zip and place the expert_demos directory into PVM-Robotics/expert_demos.
set the path/to/dir portion of the root_dir path variable in cfgs/config.yaml to the path of the PVM-Robotics repository.

Train Agents

Reinforcement learning

Meta-World

python train_rl.py \
agent=drqv2 \
suite=metaworld \
suite/metaworld_task=hammer \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
replay_buffer_size=500000 suite.num_seed_frames=4000 batch_size=512 \
use_wandb=true seed=1 exp_prefix=RL

suite/metaworld_task can be set to hammer, drawer_close, door_open, bin_picking, button_press_topdown, window_close, lever_pull, and coffee_pull.
When agent.backbone is set to resnet, agent.embedding_name can be set to mocov2-resnet50, simsiam-resnet50, swav-resnet50, densecl-resnet50, pixpro-resnet50, vicregl-resnet50, vfs-resnet50, r3m-resnet50, and vip-resnet50_VIPfc.
When agent.backbone is set to vit, agent.embedding_name can be set to mocov3-vit-b16, dino-vit-b16, ibot-vit-b16, clip-vit-b16, and mae-vit-b16.

Robosuite

python train_rl.py \
agent=drqv2 \
suite=robosuite \
suite/robosuite_task=panda_door \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
replay_buffer_size=500000 suite.num_seed_frames=4000 batch_size=512 \
use_wandb=true seed=1 exp_prefix=RL

suite/robosuite_task can be set to panda_door, panda_lift, panda_twoarm_peginhole, panda_pickplace_can, panda_nut_assembly_square, jaco_door, jaco_lift, and jaco_twoarm_peginhole.

Franka-Kitchen

python train_rl.py \
agent=drqv2 \
suite=kitchen \
suite/kitchen_task=turn_knob \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
num_train_frames_drq=1100000 replay_buffer_size=500000 suite.num_seed_frames=4000 batch_size=512 \
use_wandb=true seed=1 exp_prefix=RL

suite/kitchen_task can be set to turn_knob, turn_light_on, slide_door, open_door, and open_micro.
We train RL agents for 1.1M environment steps on Franka-Kitchen.

Imitation learning through behavior cloning

Meta-World

python train_bc.py \
agent=bc \
suite=metaworld \
suite/metaworld_task=hammer \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
num_demos=25 \
use_wandb=true seed=1 exp_prefix=BC

For Meta-World, the maximum value of num_demos is 25.

Robosuite

python train_bc.py \
agent=bc \
suite=robosuite \
suite/robosuite_task=panda_door \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
num_demos=50 \
use_wandb=true seed=1 exp_prefix=BC

For Robosuite, the maximum value of num_demos is 50.

Franka-Kitchen

python train_bc.py \
agent=bc \
suite=kitchen \
suite/kitchen_task=turn_knob \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
num_demos=25 \
use_wandb=true seed=1 exp_prefix=BC

For Franka-Kitchen, the maximum value of num_demos is 25.

Imitation learning with a visual reward function

Meta-World

python train_vrf.py \
agent=potil \
suite=metaworld \
suite/metaworld_task=hammer \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
bc_regularize=true num_demos=1 \
use_wandb=true seed=1 exp_prefix=VRF

Robosuite

python train_vrf.py \
agent=potil \
suite=robosuite \
suite/robosuite_task=panda_door \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
bc_regularize=true num_demos=1 \
use_wandb=true seed=1 exp_prefix=VRF

Franka-Kitchen

python train_vrf.py \
agent=potil \
suite=kitchen \
suite/kitchen_task=turn_knob \
agent.backbone=resnet \
agent.embedding_name=mocov2-resnet50 \
bc_regularize=true num_demos=1 \
use_wandb=true seed=1 exp_prefix=VRF

Acknowledgement

We have modified and integrated the code from ROT and DrQ-v2 into this project.

Citation

If you find this repository useful, please consider giving a star ⭐ and citation:

@article{hu2023pre,
  title={For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal},
  author={Hu, Yingdong and Wang, Renhao and Li, Li Erran and Gao, Yang},
  journal={arXiv preprint arXiv:2304.04591},
  year={2023}
}

arjun-krishna/PVM-Robotics

For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

Installation

Dependency Setup

Download Pre-Trained Vision Models

Download Expert Demonstrations

Train Agents

Reinforcement learning

Meta-World

Robosuite

Franka-Kitchen

Imitation learning through behavior cloning

Meta-World

Robosuite

Franka-Kitchen

Imitation learning with a visual reward function

Meta-World

Robosuite

Franka-Kitchen

Acknowledgement

Citation