ManipLLM

The official codebase for ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation (CVPR 2024)

Acknowledgement

This repo benefits from LLama_Adapter and Where2act. Thanks for their wonderful works.

Setup

conda create --name manipllm python=3.8
conda activate manipllm
pip install -r requirements.txt

Data Collection

Collect data by your own: Download partnet mobility urdf from its official website and place under ./ManipLLM/data_collection/asset.

./asset/original_sapien_dataset
  ├── 148
  |   └── mobility.urdf
  ├── 149
  |   └── mobility.urdf
  ├── ...
  │   ...
  └── ...

cd ./ManipLLM/data_collection/code

bash scripts/run_gen_offline_data.sh

This command will first generate training dataset and then generate the testing dataset.

Model Training

Preparation:

Download checkpoints for CLIP, LLaMa-Adapter. The downloaded checkpoints should be placed under /ManipLLM/train/ckpts. Obtain the LLaMA backbone weights using this form. Please note that checkpoints from unofficial sources (e.g., BitTorrent) may contain malicious code and should be used with care. Organize the downloaded checkpoints in the following structure:
```
./ckpts/llama_model_weights
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
└── tokenizer.model
./ckpts/BIAS_LORA_NORM-336-Chinese-7B.pth
./ckpts/ViT-L-14-336px.pt
```
Model training: The training requires the server to has a least 40g memory. The command will first generate the training json, then start training
```
cd ./ManipLLM/train

bash finetune.sh
```

Model Testing