sysuyy's Stars
riverstone496/awesome-second-order-optimization
OpenDriveLab/AgiBot-World
World's First Large-scale High-quality Robotic Manipulation Benchmark
deepseek-ai/DeepSeek-V3
adalkiran/llama-nuts-and-bolts
A holistic way of understanding how Llama and its components run in practice, with code and detailed documentation.
xichenpan/ARLDM
Official Pytorch Implementation of Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
BlinkDL/RWKV-LM
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.
OliverRensu/FlowAR
“FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching” FlowAR employs a simplest scale design and is compatible with any VAE.
buoyancy99/diffusion-forcing
code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
SimarKareer/EgoMimic
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
GAIR-NLP/O1-Journey
O1 Replication Journey: A Strategic Progress Report – Part I
rom1504/img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
allenai/awesome-open-source-lms
Friends of OLMo and their links.
facebookresearch/flow_matching
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
JunyaoHu/common_metrics_on_video_quality
You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.
songweige/content-debiased-fvd
[CVPR 2024] On the Content Bias in Fréchet Video Distance
chuanyangjin/fast-DiT
Fast Diffusion Models with Transformers
ByteFlow-AI/TokenFlow
🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".
omerbt/TokenFlow
Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)
minyoungg/platonic-rep
VideoVerses/VideoTuna
Let's finetune video generation models!
Lightricks/LTX-Video
Official repository for LTX-Video
youngsheen/SimVQ
SimVQ: Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
mit-han-lab/vila-u
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator
ChaofanTao/Autoregressive-Models-in-Vision-Survey
The paper collections for the autoregressive models in vision.
PKU-YuanGroup/WF-VAE
Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
PKU-RL/CLIP4MC
An RL-Friendly Vision-Language Model for Minecraft
mihirp1998/VADER
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.