llava

There are 213 repositories under llava topic.

ollama/ollama
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Language:Go152k 857 8.2k13.2k
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Language:Python23.6k 160 1.6k2.6k
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python17.9k 117 3.2k2.9k
Fanghua-Yu/SUPIR
SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.
Language:Python5.2k 67 154442
yuanzhoulvpi2017/zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)
Language:Jupyter Notebook3.6k 30 210431
SciSharp/LLamaSharp
A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.
Language:C#3.4k 62 434463
open-compass/VLMEvalKit
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Language:Python3.1k 13 485497
om-ai-lab/OmAgent
Build multimodal language agents for fast prototype and production
Language:Python2.5k 132 31281
chenking2020/FindTheChatGPTer
ChatGPT爆火，开启了通往AGI的关键一步，本项目旨在汇总那些ChatGPT的开源平替们，包括文本大模型、多模态大模型等，为大家提供一些便利
2k 56 8201
Blaizzy/mlx-vlm
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Language:Python1.6k 22 259172
mbzuai-oryx/Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Language:Python1.4k 15 127119
unum-cloud/uform
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Language:Python1.2k 15 3876
jhc13/taggui
Tag manager and captioner for image datasets
Language:Python1.1k 17 23551
gokayfem/awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
Language:Markdown1k 16 452
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
Language:Python895 12 17394
NVlabs/EAGLE
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
Language:Python865 31 3648
mbzuai-oryx/LLaVA-pp
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
Language:Python841 10 3461
PsyChip/machina
OpenCV+YOLO+LLAVA powered video surveillance system
Language:Python775 3 135
PaddlePaddle/PaddleMIX
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Language:Python697 24 198218
SkalskiP/awesome-foundation-and-multimodal-models
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Language:Python634 27 545
gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Language:Python527 7 12951
ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language:Python524 10 2828
nrl-ai/llama-assistant
AI-powered assistant to help you with your daily tasks, powered by Llama 3, DeepSeek R1, and many more models on HuggingFace.
Language:Python515 12 1544
apocas/restai
RESTai is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex & Langchain. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama/vLLM/etc. Precise embeddings usage and tuning. Built-in image generation (Dall-E, SD, Flux) and dynamic loading generators.
Language:Python431 9 1882
yuanze-lin/Olympus
[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"
Language:Python425 4 028
xiaoachen98/Open-LLaVA-NeXT
An open-source implementation for training LLaVA-NeXT.
Language:Python419 11 3122
RLHF-V/RLAIF-V
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
Language:Python411 6 4117
InternLM/InternEvo
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
Language:Python407 10 9169
zli12321/Vision-Language-Models-Overview
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
368 2 017
zjysteven/lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Language:Python333 8 6238
WisconsinAIVision/ViP-LLaVA
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Language:Python331 6 2821
tianyi-lab/HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Language:Python297 5 138
KolosalAI/Kolosal
Kolosal AI is an OpenSource and Lightweight alternative to LM Studio to run LLMs 100% offline on your device.
Language:C++296 6 2924
developersdigest/ai-devices
AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more
Language:TypeScript293 3 041
FuxiaoLiu/LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Language:Python286 13 2415
mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Language:Python286 6 2719

llava

ollama/ollama

haotian-liu/LLaVA

sgl-project/sglang

Fanghua-Yu/SUPIR

yuanzhoulvpi2017/zero_nlp

SciSharp/LLamaSharp

open-compass/VLMEvalKit

om-ai-lab/OmAgent

chenking2020/FindTheChatGPTer

Blaizzy/mlx-vlm

mbzuai-oryx/Video-ChatGPT

unum-cloud/uform

jhc13/taggui

gokayfem/awesome-vlm-architectures

TinyLLaVA/TinyLLaVA_Factory

NVlabs/EAGLE

mbzuai-oryx/LLaVA-pp

PsyChip/machina

PaddlePaddle/PaddleMIX

SkalskiP/awesome-foundation-and-multimodal-models

gokayfem/ComfyUI_VLM_nodes

ictnlp/LLaVA-Mini

nrl-ai/llama-assistant

apocas/restai

yuanze-lin/Olympus

xiaoachen98/Open-LLaVA-NeXT

RLHF-V/RLAIF-V

InternLM/InternEvo

zli12321/Vision-Language-Models-Overview

zjysteven/lmms-finetune

WisconsinAIVision/ViP-LLaVA

tianyi-lab/HallusionBench

KolosalAI/Kolosal

developersdigest/ai-devices

FuxiaoLiu/LRV-Instruction

mbzuai-oryx/VideoGPT-plus