/ManipLLM

The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation(cvpr 2024)

Primary LanguagePython

ManipLLM

The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation (CVPR 2024)

Acknowledgement

This repo benefits from LLama_Adapter and Where2act. Thanks for their wonderful works.

Setup

  1. conda create --name manipllm python=3.8

  2. conda activate manipllm

  3. pip install -r requirements.txt

Data Collection

  • Collect data by your own: Download partnet mobility urdf from its official website and place under ./ManipLLM/data_collection/asset.
    ./asset/original_sapien_dataset
      ├── 148
      |   └── mobility.urdf
      ├── 149
      |   └── mobility.urdf
      ├── ...
      │   ...
      └── ...
    
    cd ./ManipLLM/data_collection/code
    
    bash scripts/run_gen_offline_data.sh
    

This command will first generate training dataset and then generate the testing dataset.

Model Training

  • Preparation:

    Download checkpoints for CLIP, LLaMa-Adapter. The downloaded checkpoints should be placed under /ManipLLM/train/ckpts. Obtain the LLaMA backbone weights using this form. Please note that checkpoints from unofficial sources (e.g., BitTorrent) may contain malicious code and should be used with care. Organize the downloaded checkpoints in the following structure:

    ./ckpts/llama_model_weights
    ├── 7B
    │   ├── checklist.chk
    │   ├── consolidated.00.pth
    │   └── params.json
    └── tokenizer.model
    ./ckpts/BIAS_LORA_NORM-336-Chinese-7B.pth
    ./ckpts/ViT-L-14-336px.pt
    
  • Model training: The training requires the server to has a least 40g memory. The command will first generate the training json, then start training

    cd ./ManipLLM/train
    
    bash finetune.sh
    

Model Testing

  • The public code only infers on the final prompt without chain-of-thought, predicting the pose directly.

  • Remember to add the checkpoints of CLIP, [LLaMa](same with the process in training), and LLaMa-Adapter under /ManipLLM/test/ckpts as well.

  • We release the checkpoint: checkpoint-9-ori.pth. Note that, due to the randomness in data collection, the provided testing dataset is different from the ones in paper, so you may result in slightly different but comparable results compared with the results in paper. Download the released checkpoint-9-ori or use your own trained checkpoint. The link we provide is baiduyun downloading link. If you need a google drive download link, send your google account via email to xl3062@columbia.edu, then we will share the link with you. Remember to change the line5 in test.sh to the dir you placed the ckpts.

  • Download OUR test data or collect the test data by your own. The downloaded 'test_data' folder should be unziped under /ManipLLM/data_collection/data. Download partnet mobility urdf from its official website and place under /ManipLLM/data_collection/asset.

  • The testing requires the server to has a least 40g memory. This command will first use the model to infer on all the test samples, and then interact with object in the simulator (SAPIEN).

    cd ./ManipLLM/test
    
    bash test.sh