SeuTao's Stars
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
xzhih/one-key-hidpi
Enable macOS HiDPI and have a native setting.
bklieger-groq/g1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
pytorch/torchtitan
A native PyTorch Library for large model training
X-PLUG/mPLUG-Owl
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
zou-group/textgrad
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
karpathy/nano-llama31
nanoGPT style version of Llama 3.1
chaidiscovery/chai-lab
Chai-1, SOTA model for biomolecular structure prediction
showlab/Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
BAAI-DCAI/Bunny
A family of lightweight multimodal models.
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
bfshi/scaling_on_scales
When do we not need larger vision models?
Oryx-mllm/Oryx
MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
OpenGVLab/OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
yuweihao/MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
CircleRadon/TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
JUNJIE99/MLVU
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
baaivision/DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
lqtrung1998/mwp_ReFT
sail-sg/regmix
🧬 RegMix: Data Mixture as Regression for Language Model Pre-training
scenarios/WeMM
alonj/Same-Task-More-Tokens
The code for the paper: "Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models"
xverse-ai/XVERSE-MoE-A36B
XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.