RoboMP2: A Python repository from aopolin-lv

RoboMP²: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Qi Lv^1,2 · Hao Li¹ · Xiang Deng^1† · Rui Shao¹ · Michael Yu Wang² · Liqiang Nie^1†

¹Harbin Institute of Technology (Shenzhen) ²Great Bay University

^†corresponding author

ICML 2024

This work presents RoboMP², a multimodal perception-planning framework with multimodal large language model for manipulation tasks.

Performance

Here we present the performance comparsion between RoboMP² and baseline models.

Model	L1	L2	L3	L4	Avg.
End-to-end models
Gato^†	58.1	53.2	43.5	12.4	41.8
Flamingo^†	47.5	46.0	40.8	12.1	36.6
VIMA^†	81.6	81.5	79.0	48.9	72.7
RT-2	72.8	70.3	66.8	47.0	64.2
MLLM Planners
CaP	71.2	70.0	42.8	44.7	57.2
VisualProg	49.7	47.7	69.9	52.2	54.9
I2A^†	77.0	76.2	66.6	49.8	65.0
RoboMP² (Ours)	89.0	85.9	86.8	68.0	82.4

Usage

Installation

Install the required packages with the provided requirements.txt

git clone https://github.com/LiheYoung/Depth-Anything
cd Depth-Anything
pip install -r requirements.txt

If you can't install gym==2.21.0,which is necessary for this project, try the following two installation, then the gym will be installed successfully!
```
pip install setuptools==65.5.0
pip install --user wheel==0.38.0
```

Install the VIMABench with VIMABench.

Running

Change the OpenAI API-key in data_process/gptutils.py.
Download the SentenceBert model and change the path of SentenceBert in retrieval/similarity_retrieval.py.
Put the path of MLLM in model/custom_model.py.
run the eval.py.

Citation

If you find this project useful, please consider citing:

@inproceedings{lv2024robomp2,
    title     = {RoboMP$2$: A Robotic Multimodal Perception-Planning Framework with Mutlimodal Large Language Models},
    author    = {Qi Lv and Hao Li and Xiang Deng and Rui Shao and Michael Yu Wang and Liqiang Nie},
    booktitle = {International Conference on Machine Learning},
    year      = {2024}
}