vision-language-pretraining

There are 31 repositories under vision-language-pretraining topic.

  • salesforce/LAVIS

    LAVIS - A One-stop Library for Language-Vision Intelligence

    Language:Jupyter Notebook9.6k97643938
  • DAMO-NLP-SG/Video-LLaMA

    [EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

    Language:Python2.7k32155242
  • deepseek-ai/DeepSeek-VL

    DeepSeek-VL: Towards Real-World Vision-Language Understanding

    Language:Python2k1946187
  • mbzuai-oryx/Video-ChatGPT

    [ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

    Language:Python1.1k1511997
  • Sense-GVT/DeCLIP

    Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

    Language:Python623192931
  • TXH-mercury/VALOR

    Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

    Language:Python25392215
  • mbzuai-oryx/VideoGPT-plus

    Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

    Language:Python18152311
  • sail-sg/ptp

    [CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》

    Language:Python1487104
  • Surrey-UP-Lab/RegionSpot

    Recognize Any Regions

    Language:Python1161154
  • ArrowLuo/SegCLIP

    PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"

    Language:Python77958
  • vgthengane/Continual-CLIP

    Official repository for "CLIP model is an Efficient Continual Learner".

    Language:Python72552
  • jusiro/FLAIR

    FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.

    Language:Python71247
  • marslanm/Multimodality-Representation-Learning

    This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

  • megvii-research/protoclip

    📍 Official pytorch implementation of paper "ProtoCLIP: Prototypical Contrastive Language Image Pretraining" (IEEE TNNLS)

    Language:Python45800
  • Zoky-2020/SGA

    Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]

    Language:Python444102
  • HieuPhan33/CVPR2024_MAVL

    Multi-Aspect Vision Language Pretraining - CVPR2024

    Language:Python43120
  • TXH-mercury/COSA

    Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

    Language:Python37242
  • TencentARC/FLM

    Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)

    Language:Python31601
  • yiren-jian/BLIText

    [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

    Language:Python23351
  • alinlab/b2t

    Bias-to-Text: Debiasing Unknown Visual Biases through Language Interpretation

    Language:Python22321
  • omipan/svl_adapter

    SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models

    Language:Python20223
  • adarobustness/adaptation_robustness

    Evaluate robustness of adaptation methods on large vision-language models

    Language:Shell15110
  • ChenDelong1999/ITRA

    A codebase for flexible and efficient Image Text Representation Alignment

    Language:Python13241
  • unitaryai/VTC

    VTC: Improving Video-Text Retrieval with User Comments

    Language:Python11300
  • jaisidhsingh/LoRA-CLIP

    Easy wrapper for inserting LoRA layers in CLIP.

    Language:Python101
  • ahmdtaha/distributed_sigmoid_loss

    Unofficial implementation for Sigmoid Loss for Language Image Pre-Training

    Language:Python9110
  • LooperXX/ManagerTower

    Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

    Language:Python9201
  • YyzHarry/vlm-fairness

    Demographic Bias of Vision-Language Foundation Models in Medical Imaging

    Language:Python9102
  • BUAADreamer/CCRK

    [KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

    Language:Python3200
  • xmed-lab/FD-SOS

    MICCAI 2024: Vision-Language Open-Set Detectors for Bone Fenestration and Dehiscence Detection from Intraoral Images

    Language:Python3200
  • unitaryai/VTC-dataset

    Language:Python0210