Implementation for the CVPR 2023 paper "Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language"
pip install -r requirements.txt
For Charades dataset, we use the same data preparation process as MS-2D-TAN, follows here for details.
For Charades-CG dataset, download https://github.com/YYJMJC/Compositional-Temporal-Grounding/tree/main/Charades-CG and put them under data/Charades-CG
.
python train.py --cfg experiments/charades/config.yaml
We provide a ckpt with results similar to our paper, and compare them in the table below.
Split | Test-Trivial | Novel-Composition | Novel-Word |
---|---|---|---|
Metric | R1@0.5 | R1@0.7 | mIoU | R1@0.5 | R1@0.7 | mIoU | R1@0.5 | R1@0.7 | mIoU |
paper version | 58.14 | 37.98 | 50.58 | 46.54 | 25.10 | 40.00 | 50.36 | 28.78 | 43.15 |
github version | 59.34 | 38.53 | 51.48 | 45.96 | 24.67 | 40.01 | 50.07 | 29.21 | 42.79 |
You can download the checkpoint and put it under `ckpt' folder, then run
python evaluate.py --cfg experiments/charades/config.yaml --load ckpt/MS-2D-TAN_iter20461.pt
If any part of our paper and code is helpful to your work, please generously cite with:
@inproceedings{li2023exploring,
title={Exploring the Effect of Primitives for Compositional Generalization in Vision-and-Language},
author={Li, Chuanhao and Li, Zhen and Jing, Chenchen and Jia, Yunde and Wu, Yuwei},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={19092--19101},
year={2023}
}
- Thanks for the great MS-2D-TAN.
- We use the Charades-CG dataset, thanks for their work.