/RoboMP2

[ICML 2024] RoboMP2: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Primary LanguagePython

RoboMP2: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Qi Lv1,2 · Hao Li1 · Xiang Deng1† · Rui Shao1 · Michael Yu Wang2 · Liqiang Nie1†

1Harbin Institute of Technology (Shenzhen)    2Great Bay University    

corresponding author

ICML 2024

Paper PDF Project Page

This work presents RoboMP2, a multimodal perception-planning framework with multimodal large language model for manipulation tasks.

Performance

Here we present the performance comparsion between RoboMP2 and baseline models.

Model L1 L2 L3 L4 Avg.
End-to-end models
Gato 58.1 53.2 43.5 12.4 41.8
Flamingo 47.5 46.0 40.8 12.1 36.6
VIMA 81.6 81.5 79.0 48.9 72.7
RT-2 72.8 70.3 66.8 47.0 64.2
MLLM Planners
CaP 71.2 70.0 42.8 44.7 57.2
VisualProg 49.7 47.7 69.9 52.2 54.9
I2A 77.0 76.2 66.6 49.8 65.0
RoboMP2 (Ours) 89.0 85.9 86.8 68.0 82.4

Usage

Installation

  1. Install the required packages with the provided requirements.txt
git clone https://github.com/LiheYoung/Depth-Anything
cd Depth-Anything
pip install -r requirements.txt
  • If you can't install gym==2.21.0,which is necessary for this project, try the following two installation, then the gym will be installed successfully!

    pip install setuptools==65.5.0
    pip install --user wheel==0.38.0
  1. Install the VIMABench with VIMABench.

Running

  1. Change the OpenAI API-key in data_process/gptutils.py.

  2. Download the SentenceBert model and change the path of SentenceBert in retrieval/similarity_retrieval.py.

  3. Put the path of MLLM in model/custom_model.py.

  4. run the eval.py.

Citation

If you find this project useful, please consider citing:

@inproceedings{lv2024robomp2,
    title     = {RoboMP$2$: A Robotic Multimodal Perception-Planning Framework with Mutlimodal Large Language Models},
    author    = {Qi Lv and Hao Li and Xiang Deng and Rui Shao and Michael Yu Wang and Liqiang Nie},
    booktitle = {International Conference on Machine Learning},
    year      = {2024}
}