This repository provides the official implementation of our ACL 2023 Findings paper titled, Learning from Children: Improving Image-Caption Pretraining via Curriculum. The code is built on top of Open Vocabulary Object Detection. We appreciate the work of the authors in this valuable project.
Create environment and set up data as instructed in this repo; with this exception -- install PyTorch version 1.2.0 instead of PyTorch 1.0 nightly.
Generate curriculum data by running this notebook.
For 4 gpus:
python -m torch.distributed.launch --nproc_per_node=4 --master_port 6254 tools/train_net.py --skip-test --config-file configs/mmss_rcnn_v01_4x_cur_rs.yaml OUTPUT_DIR runs/
Run the above command with the following changes in configs/mmss_rcnn_v01_4x_cur_rs.yaml:
- CURRICULUM.DO = False
- Comment out MODEL.MMSS_HEAD.GROUNDING.ALIGNMENT_CURRICULUM
You can evaluate using a similar command as above, by running tools/test_net.py
.
This repository is released under the MIT license. See LICENSE for additional details.