This repo collects the research resources of CLIP (Contrastive Language-Image Pre-Training) proposed by OpenAI. If you would like to contribute, please open issues or pull requests.
- Learning Transferable Visual Models From Natural Language Supervision [paper][code]
- CLIP: Connecting Text and Images [blog]
- Multimodal Neurons in Artificial Neural Networks [blog]
- OpenCLIP (3rd-party, PyTorch) [code]
- Train-CLIP (3rd-party, PyTorch) [code]
- Paddle-CLIP (3rd-party, PaddlePaddle) [code]
- VQGAN-CLIP [code]
- StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [paper][code]
- CLIP Guided Diffusion [code]
- CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions [paper]
- Roboflow Zero-shot Object Tracking [code]
- Zero-Shot Detection via Vision and Language Knowledge Distillation [paper][code*]
- Unsplash Image Search [code]
- CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval [paper][code]
- Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling [paper][code]
- VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding [code]
- CLIP prefix captioning [code]
- HairCLIP: Design Your Hair by Text and Reference Image [code]
- Crop-CLIP [code]
- CLIPstyler: Image Style Transfer with a Single Text Condition [code]
- CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation, [paper]
- Wav2CLIP: Learning Robust Audio Representations From CLIP[code]
- CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotation [paper]
- RgionCLIP: Region-based Language-Image Pretraining [Paper]
- CMA-CLIP: Cross-Modality Attention CLIP for Image-Text Classification [paper]