vision-language-models

There are 26 repositories under vision-language-models topic.

baaivision/EVE
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
Language:Python255 9 165
snap-research/MyVLM
Official Implementation for "MyVLM: Personalizing VLMs for User-Specific Queries" (ECCV 2024)
Language:Python160 13 910
baaivision/DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Language:Python130 5 51
BAAI-Agents/GPA-LM
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
120 4 15
NishilBalar/Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
62 3 13
yu-rp/apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
Language:Python56 2 44
elkhouryk/RS-TransCLIP
[ICASSP 2025] Open-source code for the paper "Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification"
Language:Python41 2 01
erfanshayegani/Jailbreak-In-Pieces
[ICLR 2024 Spotlight 🔥 ] - [ Best Paper Award SoCal NLP 2023 🏆] - Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models
Language:Python30 3 42
vanillaer/CPL-ICML2024
[ICML 2024] Offical code repo for ICML2024 paper "Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data"
Language:Python22 2 22
lezhang7/SAIL
[Under review] Assessing and Learning Alignment of Unimodal Vision and Language Models
Language:Jupyter Notebook141
ytaek-oh/fsc-clip
[EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality
Language:Python11 3 00
danelpeng/Awesome-Continual-Leaning-with-PTMs
This is a curated list of "Continual Learning with Pretrained Models" research.
10 2 00
jiayuww/SpatialEval
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
Language:Python7 2 00
ytaek-oh/awesome-vl-compositionality
Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.
7 1 00
chu0802/SnD
This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models, accepted to ECCV'24
6 1 00
sled-group/COMFORT
Repo for the paper "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities"
Language:Python5 5 00
sitamgithub-MSIT/PicQ
PicQ: Demo for MiniCPM-V 2.6 to answer questions about images using natural language.
Language:Python4 2 00
Shengwei-Peng/TOCFL-MultiBench
TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.
Language:Python30
sitamgithub-MSIT/VidiQA
VidiQA: Demo for MiniCPM-V 2.6 to answer questions about videos using natural language.
Language:Python2 2 0
zwenyu/colearn-plus
Code for Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training [IJCV 2024], Rethinking the Role of Pre-Trained Networks in Source-Free Domain Adaptation [ICCV 2023]
Language:Python2 2 00
akskuchi/dHM-visual-storytelling
Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition – EMNLP 2024 (Findings)
Language:Python1 1 00
andrewliao11/Q-Spatial-Bench-code
Official repo of the paper "Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models"
Language:Python1 2 00
hironcode/farcer
🚀 IEEE UEMCON 2024 "Fully Autoregressive Multimodal LLM for Contextual Emotion Recognition"
Language:Python0 2 00
YuweiYin/Code4Chart
Code4Chart: Using Visualization Code to Improve Chart Understanding for Vision-Language Models
Language:Python00
Ibtissam-SAADI/CLIVP-FER
Facial Expression Recognition using vision language models (VLMs)
Language:Python1 01
sitamgithub-MSIT/paligemma2-docci
Image Captioning with PaliGemma 2 Vision Language Model.
Language:Python

vision-language-models

baaivision/EVE

snap-research/MyVLM

baaivision/DenseFusion

BAAI-Agents/GPA-LM

NishilBalar/Awesome-LVLM-Hallucination

yu-rp/apiprompting

elkhouryk/RS-TransCLIP

erfanshayegani/Jailbreak-In-Pieces

vanillaer/CPL-ICML2024

lezhang7/SAIL

ytaek-oh/fsc-clip

danelpeng/Awesome-Continual-Leaning-with-PTMs

jiayuww/SpatialEval

ytaek-oh/awesome-vl-compositionality

chu0802/SnD

sled-group/COMFORT

sitamgithub-MSIT/PicQ

Shengwei-Peng/TOCFL-MultiBench

sitamgithub-MSIT/VidiQA

zwenyu/colearn-plus

akskuchi/dHM-visual-storytelling

andrewliao11/Q-Spatial-Bench-code

hironcode/farcer

YuweiYin/Code4Chart

Ibtissam-SAADI/CLIVP-FER

sitamgithub-MSIT/paligemma2-docci