ustcfd

ustcfd's Stars

Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
Language:Jupyter Notebook37420
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Language:TypeScript50.2k7.2k
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Language:Python3.2k277
hiyouga/LLaMA-Factory
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
Language:Python33.3k4.1k
kq-chen/VLMEvalKit
Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
Language:Python8
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
12.4k791
LLaVA-VL/LLaVA-NeXT
Language:Python2.8k223
Pints-AI/1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
Language:Python25319
TUDB-Labs/MixLoRA
State-of-the-art Parameter-Efficient MoE Fine-tuning Method
Language:Python839
VT-NLP/MixLoRA
Multimodal Instruction Tuning with Conditional Mixture of LoRA (ACL 2024)
Language:Python8
OpenGVLab/VisionLLM
VisionLLM Series
Language:Python88625
TinyLLaVA/TinyLLaVA_Factory
A Framework of Small-scale Large Multimodal Models
Language:Python62059
maxin-cn/Cinemo
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
Language:Python22920
kevin-meng/HuggingfaceDownloadShare
**如何下载huggingface 模型并共享链接
Language:Jupyter Notebook526
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Language:Python48726
Outsider565/LoRA-GA
Language:Jupyter Notebook1456
mst272/simple-lora-plus
A simple implementation of LoRA+: Efficient Low Rank Adaptation of Large Models
Language:Python51
GaiZhenbiao/Phi3V-Finetuning
Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.
Language:Python5315
pjlab-sys4nlp/llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Language:Python87846
om-ai-lab/OmAgent
A multimodal agent framework for solving complex tasks [EMNLP'2024]
Language:Python1k84
HJYao00/DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
Language:Python1285
kyegomez/awesome-multi-agent-papers
A compilation of the best multi-agent papers
23217
CircleRadon/TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
Language:Python2059
LprG6WVR0e/MeteoRA
Code for paper: "MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models"
Language:Python2
lxtGH/OMG-Seg
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Language:Python1.3k48
Tencent/MimicMotion
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
Language:Python1.8k154
TMElyralab/MuseTalk
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Language:Python2.7k329
Kedreamix/Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
Language:Python2k321
PhoenixZ810/MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
Language:Python1474
cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language:Python1.7k113