/Awesome-CLIP

Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).

Awesome CLIP

This repo collects the research resources of CLIP (Contrastive Language-Image Pre-Training) proposed by OpenAI. If you would like to contribute, please open issues or pull requests.

CLIP

  • Learning Transferable Visual Models From Natural Language Supervision [paper][code]
  • CLIP: Connecting Text and Images [blog]
  • Multimodal Neurons in Artificial Neural Networks [blog]

Training

  • OpenCLIP (3rd-party, PyTorch) [code]
  • Train-CLIP (3rd-party, PyTorch) [code]
  • Paddle-CLIP (3rd-party, PaddlePaddle) [code]

Applications

GAN

  • VQGAN-CLIP [code]
  • StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery [paper][code]
  • CLIP Guided Diffusion [code]
  • CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions [paper]

Object Detection

  • Roboflow Zero-shot Object Tracking [code]
  • Zero-Shot Detection via Vision and Language Knowledge Distillation [paper][code*]

Information Retrieval

  • Unsplash Image Search [code]
  • CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval [paper][code]
  • Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling [paper][code]

Video Understanding

  • VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding [code]

Image Captioning

  • CLIP prefix captioning [code]

Image Editing

  • HairCLIP: Design Your Hair by Text and Reference Image [code]
  • Crop-CLIP [code]
  • CLIPstyler: Image Style Transfer with a Single Text Condition [code]

Text-to-3D Generation

  • CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation, [paper]

Representation Learning

  • Wav2CLIP: Learning Robust Audio Representations From CLIP[code]
  • CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotation [paper]
  • RgionCLIP: Region-based Language-Image Pretraining [Paper]
  • CMA-CLIP: Cross-Modality Attention CLIP for Image-Text Classification [paper]

Others

  • Multilingual-CLIP [code]
  • CLIP (With Haiku + Jax!) [code]
  • CLIP-Event: Connecting Text and Images with Event Structures [paper][code]

Acknowledgment

Awesome Visual-Transformer