1349949's Stars
yt-dlp/yt-dlp
A feature-rich command-line audio/video downloader
ggerganov/llama.cpp
LLM inference in C/C++
microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
PKU-YuanGroup/ChatLaw
ChatLaw:A Powerful LLM Tailored for Chinese Legal. 中文法律大模型
THUDM/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek3, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
SCIR-HI/Huatuo-Llama-Med-Chinese
Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调
open-compass/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
shibing624/MedicalGPT
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。
ytongbai/LVM
shikras/shikra
mlfoundations/datacomp
DataComp: In search of the next generation of multimodal datasets
BradyFU/Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
baaivision/Uni3D
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI
magic-research/bubogpt
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
OpenGVLab/all-seeing
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"
TonyLianLong/LLM-groundedDiffusion
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD, TMLR 2024)
OpenDriveLab/ST-P3
[ECCV 2022] ST-P3, an end-to-end vision-based autonomous driving framework via spatial-temporal feature learning.
LinShan-Bin/OccNeRF
Code of "OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments".
tsb0601/MMVP
bytedance/lynx-llm
paper: https://arxiv.org/abs/2307.02469 page: https://lynx-llm.github.io/
FreedomIntelligence/Huatuo-26M
The Largest-scale Chinese Medical QA Dataset: with 26,000,000 question answer pairs.
E2E-AD/AD-MLP
wudongming97/Prompt4Driving
[AAAI2025] Language Prompt for Autonomous Driving
mynameischaos/Lion
Lion: Kindling Vision Intelligence within Large Language Models
will-singularity/Skywork-MM
Empirical Study Towards Building An Effective Multi-Modal Large Language Model