IvanAer/G-Universal-CLIP

4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level Recognition workshop at ECCV 2022

Jupyter NotebookMIT

General Image Descriptors for Open World Image Retrieval using ViT CLIP @ ECCV 2022

4th place solution - Google Universal Image Embedding Kaggle Challenge

Instance-Level Recognition workshop

Marcos V. Conde, Ivan Aerlic, Simon Jégou

News 🚀🚀

[10/2022] We open sourced a kaggle notebook achieving 0.603 on the private LB in a zero shot manner (no data), leveraging CLIP ViT-H, GPT3 and a PCA
[10/2022] The paper will be available by 17th October
[10/2022] 4th place solution! setting up this repo

Structure

We use code and pre-trained models from the amazing repo open_clip !

soup.ipynb model soups script. Idea from mlfoundation WiSE-FT and Robust fine-tuning of zero-shot models
train_vit_h_224.ipynb - Train ViT-H/14 pre-trained on LAION-2B
train_vit_l_336.ipynb - Train ViT-L/14 pre-trained on LAION-2B
utilities.py - General utilities!

Contact

Feel free to contact us if you have suggestions/inquiries about this work: marcos.conde-osorio@uni-wuerzburg.de and ivanaer@outlook.com Please add "google challenge" in the email subject.

@article{conde2022general,
  title={General image descriptors for open world image retrieval using vit clip},
  author={Conde, Marcos V and Aerlic, Ivan and J{\'e}gou, Simon},
  journal={arXiv preprint arXiv:2210.11141},
  year={2022}
}