/CLIP_from_scratch

OpenAI's CLIP model implementation. It uses ViT as Image Encode and BERT like transformer Encoder as Text Encoder.

Primary LanguagePythonApache License 2.0Apache-2.0

Watchers