Czi24's Stars
PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback
Deep-Agent/R1-V
Witness the aha moment of VLM with less than $3.
EvolvingLMMs-Lab/open-r1-multimodal
A fork to add multimodal model training to open-r1
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
getAsterisk/deepclaude
A high-performance LLM inference API and Chat UI that integrates DeepSeek R1's CoT reasoning traces with Anthropic Claude models.
Jiayi-Pan/TinyZero
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
huggingface/open-r1
Fully open reproduction of DeepSeek-R1
modelscope/awesome-deep-reasoning
Collect every awesome work about r1!
agentica-project/deepscaler
Democratizing Reinforcement Learning for LLMs
schuy1er/EWF_official
An official code for "Endpoints Weight Fusion for Class Incremental Semantic Segmentation"
MrGiovanni/ContinualLearning
[MICCAI 2023] Continual Learning for Abdominal Multi-Organ and Tumor Segmentation
arthurdouillard/CVPR2021_PLOP
Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation
simplescaling/s1
s1: Simple test-time scaling
shawnricecake/Heima
Code for Heima
DAMO-NLP-SG/DiGIT
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
The-AI-Alliance/GEO-Bench-VLM
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks
SegmentationBLWX/cssegmentation
CSSegmentation: An Open Source Continual Semantic Segmentation Toolbox Based on PyTorch.
LMM101/Awesome-Multimodal-Next-Token-Prediction
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
meta-llama/llama
Inference code for Llama models
mbzuai-oryx/LlamaV-o1
Rethinking Step-by-step Visual Reasoning in LLMs
jzhang38/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
lucidrains/transfusion-pytorch
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
FoundationVision/LlamaGen
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
FoundationVision/Infinity
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
PKU-YuanGroup/Next-Patch-Prediction
AILab-CVC/SEED-X
Multimodal Models in Real World
mit-han-lab/vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
FoundationVision/VAR
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
ByteFlow-AI/TokenFlow
[CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
deepcs233/Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning