vision-language-models

There are 26 repositories under vision-language-models topic.

  • baaivision/EVE

    [NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models

    Language:Python2559165
  • snap-research/MyVLM

    Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)

    Language:Python16013910
  • baaivision/DenseFusion

    DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

    Language:Python130551
  • BAAI-Agents/GPA-LM

    This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".

  • NishilBalar/Awesome-LVLM-Hallucination

    up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

  • yu-rp/apiprompting

    [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

    Language:Python56244
  • elkhouryk/RS-TransCLIP

    [ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"

    Language:Python41201
  • erfanshayegani/Jailbreak-In-Pieces

    [ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models

    Language:Python30342
  • vanillaer/CPL-ICML2024

    [ICML 2024] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data"

    Language:Python22222
  • lezhang7/SAIL

    [Under review] Assessing and Learning Alignment of Unimodal Vision and Language Models

    Language:Jupyter Notebook141
  • ytaek-oh/fsc-clip

    [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality

    Language:Python11300
  • danelpeng/Awesome-Continual-Leaning-with-PTMs

    This is a curated list of "Continual Learning with Pretrained Models" research.

  • jiayuww/SpatialEval

    [NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs

    Language:Python7200
  • ytaek-oh/awesome-vl-compositionality

    Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.

  • chu0802/SnD

    This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24

  • sled-group/COMFORT

    Repo for the paper "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities"

    Language:Python5500
  • sitamgithub-MSIT/PicQ

    PicQ: Demo for MiniCPM-V 2.6 to answer questions about images using natural language.

    Language:Python4200
  • Shengwei-Peng/TOCFL-MultiBench

    TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.

    Language:Python30
  • sitamgithub-MSIT/VidiQA

    VidiQA: Demo for MiniCPM-V 2.6 to answer questions about videos using natural language.

    Language:Python220
  • zwenyu/colearn-plus

    Code for Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training [IJCV 2024], Rethinking the Role of Pre-Trained Networks in Source-Free Domain Adaptation [ICCV 2023]

    Language:Python2200
  • akskuchi/dHM-visual-storytelling

    Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition – EMNLP 2024 (Findings)

    Language:Python1100
  • andrewliao11/Q-Spatial-Bench-code

    Official repo of the paper "Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models"

    Language:Python1200
  • hironcode/farcer

    🚀 IEEE UEMCON 2024 "Fully Autoregressive Multimodal LLM for Contextual Emotion Recognition"

    Language:Python0200
  • YuweiYin/Code4Chart

    Code4Chart: Using Visualization Code to Improve Chart Understanding for Vision-Language Models

    Language:Python00
  • Ibtissam-SAADI/CLIVP-FER

    Facial Expression Recognition using vision language models (VLMs)

    Language:Python101
  • sitamgithub-MSIT/paligemma2-docci

    Image Captioning with PaliGemma 2 Vision Language Model.

    Language:Python