In this repo, we show how to train a self-supervised model by using Global Contrastive Loss (GCL) on a widely used bimodal image-text dataset CC3M.
Try in Colab: https://colab.research.google.com/drive/1FTF-cTcW11Gyrwu8uhTZOXgLsjp49Z9W?usp=sharing
Setting up a new virtual environment with Conda:
env_name='csce689_proj'
conda create -n "$env_name" python=3.10
conda activate "$env_name"
pip install -r requirements.txt
- Download the data: cc3m_subset_100k.tar.gz, a 100k subset of the Conceptual Captions dataset; mscoco_val.tar.gz, a 5k subset of the COCO val2014 dataset; clip_train.tar.gz, captions of the previous datasets. The code and data should be structured as follows:
. +--bimodal_exps (code) | +--clip_train (captions) | +--cc3m_train_subset.json | +--coco_val.json | +--datasets (images) | +--cc3m_subset_100k | +--mscoco_val
- To train a model on cc3m, use
run.slurm
if slurm is supported or runexport PYTHONPATH="$PYTHONPATH:./bimodal_exps" export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface' data_path=./datasets ann_path=./clip_train train_image_root=cc3m_subset_100k/ data=cc3m train_file=${data}_train_subset.json gamma=0.8 epochs=30 CUDA_VISIBLE_DEVICES=0 python ./bimodal_exps/clip.py \ --data_path ${data_path} \ --ann_path ${ann_path} \ --train_file ${train_file} \ --train_image_root ${train_image_root} \ --output_dir output/isogclr_${data}_g${gamma}_e${epochs} \ --init_model \ --use_amp \ --ita_type sogclr \ --tau_init 0.01 \ --sogclr_gamma ${gamma} \ --eta_init 0.03 --sched cosine \ --no-distributed \ --epochs ${epochs}
- To test the performance of a model on mscoco, use
eval.slurm
if slurm is supported or runexport PYTHONPATH="$PYTHONPATH:./bimodal_exps" export HUGGINGFACE_HUB_CACHE='./checkpoints/huggingface' data_path=./datasets ann_path=./clip_train train_image_root=cc3m_subset_100k/ data=cc3m train_file=${data}_train_subset.json gamma=0.8 epochs=30 CUDA_VISIBLE_DEVICES=0 python ./bimodal_exps/clip.py \ --data_path ${data_path} \ --ann_path ${ann_path} \ --train_file ${train_file} \ --train_image_root ${train_image_root} \ --output_dir output/isogclr_${data}_g${gamma}_e${epochs} \ --init_model \ --use_amp \ --ita_type sogclr \ --tau_init 0.01 \ --sogclr_gamma ${gamma} \ --eta_init 0.03 --sched cosine \ --no-distributed \ --epochs ${epochs} \ --evaluate --checkpoint './output/isogclr_cc3m_g0.8_e30/checkpoint_30.pth'
If you find this tutorial helpful, please cite:
@inproceedings{qiu2023not,
title={Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization},
author={Zi-Hao Qiu, Quanqi Hu, Zhuoning Yuan, Denny Zhou, Lijun Zhang, and Tianbao Yang},
booktitle={International Conference on Machine Learning},
pages={TBD},
year={2023},
organization={PMLR}
}