clip

There are 713 repositories under clip topic.

OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Language:Python4.7k 37 341479
marqo-ai/marqo
Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
Language:Python4.7k 38 242195
easychen/pushdeer
开放源码的无App推送服务，iOS14+扫码即用。亦支持快应用/iOS和Mac客户端、Android客户端、自制设备
Language:C4.7k 42 164475
CVHub520/X-AnyLabeling
Effortless data labeling with AI support from Segment Anything and other awesome models.
Language:Python4.5k 36 712508
open-mmlab/mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
Language:Python3.5k 30 7841.1k
yuanzhoulvpi2017/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)
Language:Jupyter Notebook3.1k 30 201375
pharmapsychotic/clip-interrogator
Image to prompt with BLIP and CLIP
Language:Python2.7k 31 99429
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
2.6k 125 10227
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
Language:Jupyter Notebook2.4k 25 233216
RuffianZhong/RWidgetHelper
Android UI 快速开发，专治原生控件各种不服
Language:Java1.9k 30 121171
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language:Python1.8k 21 70118
roboflow/awesome-openai-vision-api-experiments
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
Language:Python1.7k 26 5133
open-compass/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Language:Python1.5k 11 254218
mbzuai-oryx/Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Language:Python1.3k 15 122110
yzhuoning/Awesome-CLIP
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
1.2k 19 1557
unum-cloud/uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Language:Python1.1k 15 3063
EdVince/Stable-Diffusion-NCNN
Stable Diffusion in NCNN with c++, supported txt2img and img2img
Language:C++1k 26 4597
haltakov/natural-language-image-search
Search photos on Unsplash using natural language
Language:Jupyter Notebook990 10 12103
haltakov/natural-language-youtube-search
Search inside YouTube videos using natural language
Language:Jupyter Notebook918 14 672
ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Language:Python898 13 110125
omerbt/Text2LIVE
Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)
Language:Python885 29 2279
hila-chefer/Transformer-MM-Explainability
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Language:Jupyter Notebook810 8 36107
eps696/aphantasia
CLIP + FFT/DWT/RGB = text to image/video
Language:Python777 22 37103
SkyWorkAIGC/SkyPaint-AI-Diffusion
基于Stable Diffusion优化的AI绘画模型。支持输入中英文文本，可生成多种现代艺术风格的高质量图像。| An optimized text-to-image model based on Stable Diffusion. Both Chinese and English text inputs are available to generate images. The model can generate high-quality images in several modern art styles.
667 11 439
pengsongyou/openscene
[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies
Language:Python666 19 9148
Sense-GVT/DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Language:Python641 20 2931
leondgarse/keras_cv_attention_models
Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam
Language:Python604 23 7795
SkalskiP/awesome-foundation-and-multimodal-models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Language:Python590 26 444
pablosichert/react-truncate
React component for truncating multi-line spans and adding an ellipsis.
Language:JavaScript587 9 94129
v-iashin/video_features
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
Language:Python551 6 7697
gokayfem/awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
Language:Markdown515 12 325
keshiim/ZMJImageEditor
ZMJImageEditor is a picture editing component like WeChat. It is powerful and easy to integrate, supporting rendering, text, rotation, tailoring, mapping and other functions. (ZMJImageEditor 是一个和微信一样图片编辑的组件，功能强大，极易集成，支持绘制、文字、旋转、剪裁、贴图等功能)
Language:Objective-C504 18 23103
monatis/clip.cpp
CLIP inference in plain C/C++ with no extra dependencies
Language:C++472 16 5636
cliport/cliport
CLIPort: What and Where Pathways for Robotic Manipulation
Language:Jupyter Notebook468 6 3783
harperreed/photo-similarity-search
Super simple MLX (apple silicon) CLIP based photo similarity web app
Language:Python458 4 535
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Language:Python416 22 160163

clip

OFA-Sys/Chinese-CLIP

marqo-ai/marqo

easychen/pushdeer

CVHub520/X-AnyLabeling

open-mmlab/mmpretrain

yuanzhoulvpi2017/zero_nlp

pharmapsychotic/clip-interrogator

jingyi0000/VLM_survey

rom1504/clip-retrieval

RuffianZhong/RWidgetHelper

cambrian-mllm/cambrian

roboflow/awesome-openai-vision-api-experiments

open-compass/VLMEvalKit

mbzuai-oryx/Video-ChatGPT

yzhuoning/Awesome-CLIP

unum-cloud/uform

EdVince/Stable-Diffusion-NCNN

haltakov/natural-language-image-search

haltakov/natural-language-youtube-search

ArrowLuo/CLIP4Clip

omerbt/Text2LIVE

hila-chefer/Transformer-MM-Explainability

eps696/aphantasia

SkyWorkAIGC/SkyPaint-AI-Diffusion

pengsongyou/openscene

Sense-GVT/DeCLIP

leondgarse/keras_cv_attention_models

SkalskiP/awesome-foundation-and-multimodal-models

pablosichert/react-truncate

v-iashin/video_features

gokayfem/awesome-vlm-architectures

keshiim/ZMJImageEditor

monatis/clip.cpp

cliport/cliport

harperreed/photo-similarity-search

PaddlePaddle/PaddleMIX