lvlm
There are 23 repositories under lvlm topic.
NVlabs/EAGLE
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Hon-Wong/VoRA
[Fully open] [Encoder-free MLLM] Vision as LoRA
zhaochen0110/OpenThinkIMG
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
MMStar-Benchmark/MMStar
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
NishilBalar/Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
thu-nics/FrameFusion
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
wang2226/Awesome-LLM-Decoding
📜 Paper list on decoding methods for LLMs and LVLMs
w1oves/hqclip
[ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets
OpenSparseLLMs/CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
hasanar1f/HiRED
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
The-Martyr/Awesome-Multimodal-Reasoning
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
tsinghua-fib-lab/SmartAgent
The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".
fan19-hub/LEMMA
LEMMA: An effective and explainable way to detect multimodal misinformation with LVLM and external knowledge augmentation, incorporating the intuition and reasoning capbility inside LVLM.
Sreyan88/VDGD
Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
CharlieDDDD/AISurveyPapers
Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey
top-yun/SPARK
A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.
Wu0409/Antidote
[CVPR'25] Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Ruiyang-061X/Awesome-MLLM-Reasoning
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
UBSec/UGCG-Guard
Code for USENIX Security 2024 paper: Moderating Illicit Online Image Promotion for Unsafe User Generated Content Games Using Large Vision-Language Models.
patrickamadeus/vqa-nle-llava
Novel approach that leverages LVLMs to efficiently generate high-quality synthetic VQA-NLE datasets.
camilochs/visgraphvar
VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models
codewithdark-git/TalkTube
A powerful Streamlit application that allows users to analyze and interact with YouTube video content through natural language questions.