This is the official repository for RIO benchmark, including dataset downloads, baseline method implementation and validation. If you find our dataset helpful in your work, please ⭐star and 📖cite our work.
Our work has been accepted by the NeurIPS 2023 d&b track. We will be progressively testing the performance of MLLMs on RIO.
You can follow the steps in example.ipynb to read and visualize some sample ground truth.
Here is a ground truth example from the dataset:
The intention description is "you can use the thing to cut the food on the table".
More examples are in the file ./examples.
The annotation contains:
"height": height of the image,
"width": width of the image,
"image_id": COCO image id,
"task_id": per-intention id,
"expressions": intention description, e.g., "You can use the thing to cook meal."
"bbox_list": The bbox annotations of all instances of the class that can satisfy the intention.
"mask_list": The mask annotations of all instances of the class that can satisfy intention.
Our dataset is constructed based on COCO2014 dataset, you can download them from MSCOCO, our training set images are from COCO train set, the images of common and uncommon test are from COCO val set.
You can download the latest version of data annotation from here.
We provide checkpoints for the baseline models, and you can follow the environment configuration details of the original repository and our modified code and scripts to verify the performance of these models on the RIO dataset.
cd baselines/TOIST_RIO
TOIST is built on top of the MDETR repository, and there is only a difference between distilled (TOIST) and undistilled (MDETR).
- We've organized the data into the same format, which you can download from here and put into the data/coco-tasks/annotations/ directory.
You can organize the 'data' folder as follows:
data/
├── id2name.json
├── images/
│ ├── train2014/
│ └── val2014/
└── coco-tasks/
└── annotations/
├── refcoco_task_test_common.json
├── refcoco_task_test_uncommon.json
└── refcoco_task_train.json
- Download the corresponding model checkpoint and run
bash scripts/test_wo_distill_seg.sh
to evaluate MDETR, runbash scripts/test_w_distill_seg.sh
to evaluate TOIST. You can choose to evaluate "common" or "uncommon" atrefexp_test_set
inconfigs/tdod_rio.json
.
Model | AP50_Det | Top1_Det | mIoU_Seg | Top1_Seg | Checkpoint |
---|---|---|---|---|---|
MDETR@common | 48.61 | 65.05 | 44.14 | 54.55 | link |
MDETR@uncommon | 24.20 | 39.60 | 22.03 | 34.35 | link |
TOIST@common | 49.05 | 66.72 | 45.07 | 55.85 | link |
TOIST@uncommon | 21.96 | 39.28 | 19.41 | 34.00 | link |
cd baselines/polygon-transformer
Please refer to PolyFormer for environment configuration, including fairseq
and refer
.
Train
- Download pre-processed tsv data of train set from here and put it into
datasets/finetune/reftask/
. - Run scripts after acess
run_scripts/finetune/
.
cd run_scripts/finetune/
bash train_polyformer_b_reftask.sh
Evaluation
- Download pre-processed tsv data from here and put into
datasets/finetune/reftask/
. - Download our trained model from here.
- Run scripts after access
run_scripts/evaluation/
.
cd run_scripts/evaluation/
bash evaluate_polyformer_b_reftask.sh
Model | mIoU_common | mIoU_uncommon | Checkpoint |
---|---|---|---|
Polyformer | 46.16 | 26.77 | link |
Demo
The original repository PolyFormer updated demo.py
, with our model weight trained on RIO dataset, you can immediately test intention-oriented segmentation in the demo.
Data loading and inference scripts are built on MDETR, TOIST, SeqTR, PolyFormer repositories.