REACT: Learning Customized Visual Models with Retrieval-Augmented Knowledge (CVPR 2023, Highlight 2.5%)
Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee*, Chunyuan Li*
[Project Page] [Paper]
- Introducing a customization stage to the lifecycle of foundation models!
- REACT customizes foundation models to downstream tasks without the need of any labeled data.
- [2023.03.29] Code base and checkpoints are released.
- [2023.03.25] Our research paper is selected as highlight (2.5% acceptance rate)!
- [2023.03.24] Our new checkpoint based on OpenCLIP-G/14 achieves 81.0% zero-shot on ImageNet, the new SOTA among public checkpoints!
- [2023.02.28] Paper is accepted to CVPR 2023.
- [2023.01.17] REACT paper is released.
REACT provides a pipeline that supports building index on a large dataset, and efficiently queries and retrieves relevant data for downstream tasks with information as simple as class names. See react_retrieval
for details.
You may skip this step if you want to focus on building customized models on standard benchmarks like ImageNet-1K and ELEVATER, by directly using our retrieved indices.
REACT proposes the efficient and effective locked-text gated-image tuning for tuning customized model on the retrieved dataset, with a performance improvement of up to 5.4% improvements on ImageNet. See react_customization
for details.
Baseline | REACT (Locked-Text) LAION-400M |
REACT (Gated-Image) LAION-400M |
REACT (Gated-Image) LAION-2B |
|
---|---|---|---|---|
CLIP (B32, WIT-400M) | 63.2 | 66.9 (hf) | 68.6 (hf) | -- |
OpenCLIP (B32, L-400M) | 62.9 | 65.7 (hf) | 66.4 (hf) | -- |
OpenCLIP (B32, L-2B) | 66.6 | 67.5 (hf) | 69.5 (hf) | -- |
CLIP (B16, WIT-400M) | 68.6 | 71.6 (hf) | 73.4 (hf) | -- |
CLIP (L14, WIT-400M) | 75.3 | -- | 78.1 (hf) | 79.8 (hf) |
OpenCLIP (L14, L-2B) | 75.3 | -- | 76.4 (hf) | 78.6 (hf) |
OpenCLIP (G14, L-2B) | 80.1 | -- | -- | 81.0 (hf) |
@article{liu2023react,
author = {Liu, Haotian and Son, Kilho and Yang, Jianwei and Liu, Ce and Gao, Jianfeng and Lee, Yong Jae and Li, Chunyuan},
title = {Learning Customized Visual Models with Retrieval-Augmented Knowledge},
publisher = {CVPR},
year = {2023},
}
We are grateful for the contributions of several open-source projects, including CLIP, OpenCLIP, LAION.AI, FAISS, Autofaiss, img2dataset, and ELEVATER.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.