ustcfd's Stars
Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
kq-chen/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
LLaVA-VL/LLaVA-NeXT
Pints-AI/1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
TUDB-Labs/MixLoRA
State-of-the-art Parameter-Efficient MoE Fine-tuning Method
VT-NLP/MixLoRA
Multimodal Instruction Tuning with Conditional Mixture of LoRA (ACL 2024)
OpenGVLab/VisionLLM
VisionLLM Series
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
maxin-cn/Cinemo
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
kevin-meng/HuggingfaceDownloadShare
**如何下载huggingface 模型并共享链接
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Outsider565/LoRA-GA
mst272/simple-lora-plus
A simple implementation of LoRA+: Efficient Low Rank Adaptation of Large Models
GaiZhenbiao/Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
pjlab-sys4nlp/llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
om-ai-lab/OmAgent
A multimodal agent framework for solving complex tasks [EMNLP'2024]
HJYao00/DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
kyegomez/awesome-multi-agent-papers
A compilation of the best multi-agent papers
CircleRadon/TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
LprG6WVR0e/MeteoRA
Code for paper: "MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models"
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Tencent/MimicMotion
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
TMElyralab/MuseTalk
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Kedreamix/Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
PhoenixZ810/MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.