ustcfd's Stars
Beckschen/ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
SkyworkAI/Vitron
A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
AdrianBZG/llama-multimodal-vqa
Multimodal Instruction Tuning for Llama 3
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
apple/corenet
CoreNet: A library for training deep neural networks
AILab-CVC/SEED
Official implementation of SEED-LLaMA (ICLR 2024).
LlamaFamily/Llama-Chinese
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
taishan1994/Llama3.1-Finetuning
对llama3进行全参微调、lora微调以及qlora微调。
stanfordnlp/pyreft
ReFT: Representation Finetuning for Language Models
deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
sail-sg/lorahub
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
uukuguy/multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answer based on user queries.
Leeroo-AI/mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
thunlp/LLaVA-UHD
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
google-deepmind/recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
Suikasxt/PMG
The repository of paper Personalized Multimodal Response Generation with Large Language Models
LingyvKong/OneChart
[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"
yfeng95/PoseGPT
Ivan-Tang-3D/Any2Point
[ECCV2024] Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
NVIDIA/NeMo-Aligner
Scalable toolkit for efficient model alignment
forhaoliu/ringattention
Transformers with Arbitrarily Large Context
YuchenLiu98/COMM
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
eric-ai-lab/MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
ytongbai/LVM
GraphPKU/PiSSA
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
BlinkDL/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
BAAI-DCAI/DataOptim
A collection of visual instruction tuning datasets.
csuhan/OneLLM
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language