Code Repository for Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations
by Puhao Li *, Tengyu Liu *, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang
Ag2Manip enables various manipulation tasks in scenarios where domain-specific demonstrations are unavailable. With agent-agnostic visual and action representations, Ag2Manip: (a) learns from human manipulation videos; (b) acquires diverse manipulation skills autonomously in simulation; and (c) supports robust imitation learning of manipulation skills in the real world.
Enhancing the ability of robotic systems to autonomously acquire novel manipulation skills is vital for applications ranging from assembly lines to service robots. Existing methods (e.g., VIP, R3M) rely on learning a generalized representation for manipulation tasks but overlook (i) the domain gap between distinct embodiments and (ii) the sparseness of successful task trajectories within the embodiment-specific action space, leading to misaligned and ambiguous task representations with inferior learning efficiency. Our work addresses the above challenges by introducing Ag2Manip (Agent-Agnostic representations for Manipulation) for learning novel manipulation skills. Our approach encompasses two principal innovations: (i) a novel agent-agnostic visual representation trained on human manipulation videos with embodiments masked to ensure generalizability, and (ii) an agent-agnostic action representation that abstracts the robot’s kinematic chain into an agent proxy with a universally applicable action space to focus on the core interaction between the end-effector and the object. Through our experiments, Ag2Manip demonstrates remarkable improvements across a diverse array of manipulation tasks without necessitating domain-specific demonstrations, substantiating a significant 325% improvement in average success rate across 24 tasks from FrankaKitchen, ManiSkill, and PartManip. Further ablation studies underscore the critical role of both representations in achieving such improvements.
-
Create a new
conda
environment and activate it.conda create -n ag2manip python=3.8 conda activate ag2manip
-
Install dependent libraries with
pip
.pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117 pip install -r requirements.txt
- The code is tested on
pytorch1.13.1
andcuda11.7
, modify the installation command to install other versions ofpytorch
.
- The code is tested on
-
Install Isaac Gym by following the official documentation.
To access the assets
for the simulated environments and the pre-trained ag2manip visual representation model checkpoints, please head to Google Drive.
If you find this work is helpful, please consider citing us as
@article{li2024ag2manip,
title={Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations},
author={Li, Puhao and Liu, Tengyu and Li, Yuyang and Han, Muzhi and Geng, Haoran and Wang, Shu and Zhu, Yixin and Zhu, Song-Chun and Huang, Siyuan},
journal={arXiv preprint arXiv:2404.17521},
year={2024}
}
If you have any questions about this work, feel free to contact Puhao Li at puhaoli01@gmail.com.