Official PyTorch implementation for ICCV23 paper LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval.
- 📣 Sep 2023 - Codes Released.
- 📣 July 2023 - Paper Accepted by ICCV-23.
Codes for Phase 1: Lexicon-Bottlenecked Pre-training
Codes for Phase 2: Momentum Lexicon-Contrastive Pretraining
You can follow VILT to get the datasets (gcc, f30k, coco, and sbu). Then organize the dataset as following structure:
F30k
├── f30k_data
│ ├── xxx.jpg
│ └── ...
├── f30k_test.tsv
├── f30k_val.tsv
└── f30k_train.tsv
The format of the tsv file should be:
title filepath image_id
The man with... f30k_data/1007129816.jpg 25
A man with... f30k_data/1007129816.jpg 25
...
If you find this repository useful, please consider giving a star ⭐ and citation:
@article{Luo_2023_ICCV,
title={LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval},
author={Ziyang Luo and Pu Zhao and Can Xu and Xiubo Geng and Tao Shen and Chongyang Tao and Jing Ma and Qingwei lin and Daxin Jiang},
journal={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023}
}