WasedaMagina's Stars
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
huggingface/peft
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
CompVis/taming-transformers
Taming Transformers for High-Resolution Image Synthesis
yizhongw/self-instruct
Aligning pretrained language models with instruction data generated by themselves.
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
jingyi0000/VLM_survey
Collection of AWESOME vision-language models for vision tasks
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
apple/ml-4m
4M: Massively Multimodal Masked Modeling
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
markus-perl/ffmpeg-build-script
The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.
tencent-ailab/persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
luogen1996/LaVIN
[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"
soCzech/TransNetV2
TransNet V2: Shot Boundary Detection Neural Network
jianghaojun/Awesome-Parameter-Efficient-Transfer-Learning
A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
mira-space/MiraData
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
thuanz123/enhancing-transformers
An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch
IDEA-Research/MotionLLM
[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
ZrrSkywalker/MathVerse
[ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
VT-NLP/MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
icoz69/StableLLAVA
Official repo for StableLLAVA
janghyuncho/DECOLA
Code release for "Language-conditioned Detection Transformer"
yuangpeng/dreambench_plus
Official code implementation of DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
longvideobench/LongVideoBench
[Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.
jihaonew/MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
CLUEbenchmark/SuperCLUE-Role
SuperCLUE-Role中文原生角色扮演测评基准