llava

There are 160 repositories under llava topic.

  • ollama/ollama

    Get up and running with Llama 3.3, Mistral, Gemma 2, and other large language models.

    Language:Go104k6065.3k8.3k
  • haotian-liu/LLaVA

    [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

    Language:Python20.8k1581.6k2.3k
  • sgl-project/sglang

    SGLang is a fast serving framework for large language models and vision language models.

    Language:Python6.6k61775590
  • Fanghua-Yu/SUPIR

    SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai.

    Language:Python4.5k64144392
  • InternLM/xtuner

    An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

    Language:Python4.1k36544322
  • modelscope/data-juicer

    Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

    Language:Python3.2k18207189
  • yuanzhoulvpi2017/zero_nlp

    中文nlp解决方案(大模型、数据、模型、训练、推理)

    Language:Jupyter Notebook3.1k30201375
  • SciSharp/LLamaSharp

    A C#/.NET library to run LLM (🦙LLaMA/LLaVA) on your local device efficiently.

    Language:C#2.8k56368357
  • chenking2020/FindTheChatGPTer

    ChatGPT爆火,开启了通往AGI的关键一步,本项目旨在汇总那些ChatGPT的开源平替们,包括文本大模型、多模态大模型等,为大家提供一些便利

  • open-compass/VLMEvalKit

    Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

    Language:Python1.5k11254218
  • om-ai-lab/OmAgent

    A Multimodal Language Agent Framework for Smart Devices and More

    Language:Python1.4k6113117
  • mbzuai-oryx/Video-ChatGPT

    [ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

    Language:Python1.3k15122110
  • uform

    unum-cloud/uform

    Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

    Language:Python1.1k153063
  • mbzuai-oryx/LLaVA-pp

    🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

    Language:Python818103461
  • jhc13/taggui

    Tag manager and captioner for image datasets

    Language:Python8051520338
  • PsyChip/machina

    OpenCV+YOLO+LLAVA powered video surveillance system

    Language:Python7242033
  • TinyLLaVA/TinyLLaVA_Factory

    A Framework of Small-scale Large Multimodal Models

    Language:Python6911114573
  • Blaizzy/mlx-vlm

    MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

    Language:Python60266954
  • SkalskiP/awesome-foundation-and-multimodal-models

    👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

    Language:Python59026444
  • NVlabs/EAGLE

    EAGLE: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

    Language:Python552312245
  • gokayfem/awesome-vlm-architectures

    Famous Vision Language Models and Their Architectures

    Language:Markdown51612325
  • nrl-ai/llama-assistant

    AI-powered assistant to help you with your daily tasks, powered by Llama 3.2. It can recognize your voice, process natural language, and perform various actions based on your commands: summarizing text, rephasing sentences, answering questions, writing emails, and more.

    Language:Python455101336
  • gokayfem/ComfyUI_VLM_nodes

    Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

    Language:Python432711139
  • PaddlePaddle/PaddleMIX

    Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

    Language:Python41723160163
  • restai

    apocas/restai

    RESTai is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex & Langchain. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama/vLLM/etc. Precise embeddings usage and tuning. Built-in image generation (Dall-E, SD, Flux) and dynamic loading generators.

    Language:Python402101776
  • jakobdylanc/llmcord

    Make Discord your LLM frontend ● Supports any OpenAI compatible API (Ollama, LM Studio, vLLM, OpenRouter, xAI, Mistral, Groq and more)

    Language:Python40156177
  • xiaoachen98/Open-LLaVA-NeXT

    An open-source implementation for training LLaVA-NeXT.

    Language:Python401132420
  • InternLM/InternEvo

    InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.

    Language:Python318108654
  • WisconsinAIVision/ViP-LLaVA

    [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

    Language:Python30363121
  • developersdigest/ai-devices

    AI Device Template Featuring Whisper, TTS, Groq, Llama3, OpenAI and more

    Language:TypeScript2833040
  • FuxiaoLiu/LRV-Instruction

    [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

    Language:Python263112313
  • RLHF-V/RLAIF-V

    RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

    Language:Python26163010
  • SALT-NLP/LLaVAR

    Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

    Language:Python26052113
  • tianyi-lab/HallusionBench

    [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

    Language:Python2594117
  • mbzuai-oryx/VideoGPT-plus

    Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

    Language:Python23452615
  • zjysteven/lmms-finetune

    A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.

    Language:Python20954924