clip

There are 713 repositories under clip topic.

  • OFA-Sys/Chinese-CLIP

    Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

    Language:Python4.7k37341479
  • marqo

    marqo-ai/marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

    Language:Python4.7k38242195
  • easychen/pushdeer

    开放源码的无App推送服务,iOS14+扫码即用。亦支持快应用/iOS和Mac客户端、Android客户端、自制设备

    Language:C4.7k42164475
  • CVHub520/X-AnyLabeling

    Effortless data labeling with AI support from Segment Anything and other awesome models.

    Language:Python4.5k36712508
  • open-mmlab/mmpretrain

    OpenMMLab Pre-training Toolbox and Benchmark

    Language:Python3.5k307841.1k
  • yuanzhoulvpi2017/zero_nlp

    中文nlp解决方案(大模型、数据、模型、训练、推理)

    Language:Jupyter Notebook3.1k30201375
  • pharmapsychotic/clip-interrogator

    Image to prompt with BLIP and CLIP

    Language:Python2.7k3199429
  • jingyi0000/VLM_survey

    Collection of AWESOME vision-language models for vision tasks

  • rom1504/clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system with them

    Language:Jupyter Notebook2.4k25233216
  • RuffianZhong/RWidgetHelper

    Android UI 快速开发,专治原生控件各种不服

    Language:Java1.9k30121171
  • cambrian-mllm/cambrian

    Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

    Language:Python1.8k2170118
  • roboflow/awesome-openai-vision-api-experiments

    Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

    Language:Python1.7k265133
  • open-compass/VLMEvalKit

    Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

    Language:Python1.5k11254218
  • mbzuai-oryx/Video-ChatGPT

    [ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

    Language:Python1.3k15122110
  • yzhuoning/Awesome-CLIP

    Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).

  • uform

    unum-cloud/uform

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

    Language:Python1.1k153063
  • EdVince/Stable-Diffusion-NCNN

    Stable Diffusion in NCNN with c++, supported txt2img and img2img

    Language:C++1k264597
  • natural-language-image-search

    haltakov/natural-language-image-search

    Search photos on Unsplash using natural language

    Language:Jupyter Notebook9901012103
  • natural-language-youtube-search

    haltakov/natural-language-youtube-search

    Search inside YouTube videos using natural language

    Language:Jupyter Notebook91814672
  • ArrowLuo/CLIP4Clip

    An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

    Language:Python89813110125
  • omerbt/Text2LIVE

    Official Pytorch Implementation for "Text2LIVE: Text-Driven Layered Image and Video Editing" (ECCV 2022 Oral)

    Language:Python885292279
  • hila-chefer/Transformer-MM-Explainability

    [ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

    Language:Jupyter Notebook810836107
  • aphantasia

    eps696/aphantasia

    CLIP + FFT/DWT/RGB = text to image/video

    Language:Python7772237103
  • SkyWorkAIGC/SkyPaint-AI-Diffusion

    基于Stable Diffusion优化的AI绘画模型。支持输入中英文文本,可生成多种现代艺术风格的高质量图像。| An optimized text-to-image model based on Stable Diffusion. Both Chinese and English text inputs are available to generate images. The model can generate high-quality images in several modern art styles.

  • openscene

    pengsongyou/openscene

    [CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies

    Language:Python666199148
  • Sense-GVT/DeCLIP

    Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

    Language:Python641202931
  • leondgarse/keras_cv_attention_models

    Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam

    Language:Python604237795
  • SkalskiP/awesome-foundation-and-multimodal-models

    👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

    Language:Python59026444
  • pablosichert/react-truncate

    React component for truncating multi-line spans and adding an ellipsis.

    Language:JavaScript587994129
  • v-iashin/video_features

    Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.

    Language:Python55167697
  • gokayfem/awesome-vlm-architectures

    Famous Vision Language Models and Their Architectures

    Language:Markdown51512325
  • keshiim/ZMJImageEditor

    ZMJImageEditor is a picture editing component like WeChat. It is powerful and easy to integrate, supporting rendering, text, rotation, tailoring, mapping and other functions. (ZMJImageEditor 是一个和微信一样图片编辑的组件,功能强大,极易集成,支持绘制、文字、旋转、剪裁、贴图等功能)

    Language:Objective-C5041823103
  • monatis/clip.cpp

    CLIP inference in plain C/C++ with no extra dependencies

    Language:C++472165636
  • cliport/cliport

    CLIPort: What and Where Pathways for Robotic Manipulation

    Language:Jupyter Notebook46863783
  • harperreed/photo-similarity-search

    Super simple MLX (apple silicon) CLIP based photo similarity web app

    Language:Python4584535
  • PaddlePaddle/PaddleMIX

    Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

    Language:Python41622160163