lvlm

There are 23 repositories under lvlm topic.

  • NVlabs/EAGLE

    Eagle: Frontier Vision-Language Models with Data-Centric Strategies

    Language:Python865313648
  • YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

    🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

    Language:HTML50915530
  • Hon-Wong/VoRA

    [Fully open] [Encoder-free MLLM] Vision as LoRA

    Language:Python33829
  • zhaochen0110/OpenThinkIMG

    OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

    Language:Jupyter Notebook3035
  • MMStar-Benchmark/MMStar

    [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

    Language:Python1941135
  • NishilBalar/Awesome-LVLM-Hallucination

    up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

  • thu-nics/FrameFusion

    [ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"

    Language:Python57520
  • wang2226/Awesome-LLM-Decoding

    📜 Paper list on decoding methods for LLMs and LVLMs

  • w1oves/hqclip

    [ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets

  • OpenSparseLLMs/CLIP-MoE

    CLIP-MoE: Mixture of Experts for CLIP

    Language:Python46100
  • hasanar1f/HiRED

    [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.

    Language:Python41134
  • The-Martyr/Awesome-Multimodal-Reasoning

    Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models

  • tsinghua-fib-lab/SmartAgent

    The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".

  • fan19-hub/LEMMA

    LEMMA: An effective and explainable way to detect multimodal misinformation with LVLM and external knowledge augmentation, incorporating the intuition and reasoning capbility inside LVLM.

    Language:Jupyter Notebook22173
  • Sreyan88/VDGD

    Code for ICLR 2025 Paper: Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs

    Language:Python20122
  • CharlieDDDD/AISurveyPapers

    Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey

  • top-yun/SPARK

    A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.

    Language:Python19101
  • Wu0409/Antidote

    [CVPR'25] Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception

    Language:Python14
  • Ruiyang-061X/Awesome-MLLM-Reasoning

    📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.

  • UBSec/UGCG-Guard

    Code for USENIX Security 2024 paper: Moderating Illicit Online Image Promotion for Unsafe User Generated Content Games Using Large Vision-Language Models.

    Language:Python5010
  • patrickamadeus/vqa-nle-llava

    Novel approach that leverages LVLMs to efficiently generate high-quality synthetic VQA-NLE datasets.

    Language:Python3100
  • camilochs/visgraphvar

    VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models

    Language:Python2100
  • codewithdark-git/TalkTube

    A powerful Streamlit application that allows users to analyze and interact with YouTube video content through natural language questions.

    Language:Python10