Qi Lv1,2 · Hao Li1 · Xiang Deng1† · Rui Shao1 · Michael Yu Wang2 · Liqiang Nie1†
1Harbin Institute of Technology (Shenzhen) 2Great Bay University
†corresponding author
ICML 2024
This work presents RoboMP2, a multimodal perception-planning framework with multimodal large language model for manipulation tasks.
Here we present the performance comparsion between RoboMP2 and baseline models.
Model | L1 | L2 | L3 | L4 | Avg. |
---|---|---|---|---|---|
End-to-end models | |||||
Gato† | 58.1 | 53.2 | 43.5 | 12.4 | 41.8 |
Flamingo† | 47.5 | 46.0 | 40.8 | 12.1 | 36.6 |
VIMA† | 81.6 | 81.5 | 79.0 | 48.9 | 72.7 |
RT-2 | 72.8 | 70.3 | 66.8 | 47.0 | 64.2 |
MLLM Planners | |||||
CaP | 71.2 | 70.0 | 42.8 | 44.7 | 57.2 |
VisualProg | 49.7 | 47.7 | 69.9 | 52.2 | 54.9 |
I2A† | 77.0 | 76.2 | 66.6 | 49.8 | 65.0 |
RoboMP2 (Ours) | 89.0 | 85.9 | 86.8 | 68.0 | 82.4 |
- Install the required packages with the provided requirements.txt
git clone https://github.com/LiheYoung/Depth-Anything
cd Depth-Anything
pip install -r requirements.txt
-
If you can't install gym==2.21.0,which is necessary for this project, try the following two installation, then the gym will be installed successfully!
pip install setuptools==65.5.0 pip install --user wheel==0.38.0
- Install the VIMABench with VIMABench.
-
Change the OpenAI API-key in
data_process/gptutils.py
. -
Download the SentenceBert model and change the path of SentenceBert in
retrieval/similarity_retrieval.py
. -
Put the path of MLLM in
model/custom_model.py
. -
run the
eval.py
.
If you find this project useful, please consider citing:
@inproceedings{lv2024robomp2,
title = {RoboMP$2$: A Robotic Multimodal Perception-Planning Framework with Mutlimodal Large Language Models},
author = {Qi Lv and Hao Li and Xiang Deng and Rui Shao and Michael Yu Wang and Liqiang Nie},
booktitle = {International Conference on Machine Learning},
year = {2024}
}