alex4727's Stars
adobe-research/MagicFixup
showlab/DragAnything
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
KwaiVGI/LivePortrait
Bring portraits to life!
Jiayi-Pan/TinyZero
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
ZiyuGuo99/Image-Generation-CoT
Investigating CoT Reasoning in Autoregressive Image Generation
tgxs002/HPSv2
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
deepseek-ai/DeepSeek-V3
huggingface/open-r1
Fully open reproduction of DeepSeek-R1
DAMO-NLP-SG/VideoLLaMA3
Frontier Multimodal Foundation Models for Image and Video Understanding
sihyun-yu/REPA
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think (ICLR 2025)
cure-lab/PnPInversion
[ICLR2024] Official repo for paper "PnP Inversion: Boosting Diffusion-based Editing with 3 Lines of Code"
guanyingc/cv_rebuttal_template
MRzzm/HDTF
the dataset and code for "Flow-guided One-shot Talking Face Generation with a High-resolution Audio-visual Dataset"
Tencent/Hunyuan3D-2
High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models.
zacharyhorvitz/Fk-Diffusion-Steering
A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.
wyhsirius/LIA
[ICLR 22, TPAMI 24] LIA: Latent Image Animator
harlanhong/awesome-talking-head-generation
JosephPai/Awesome-Talking-Face
📖 A curated list of resources dedicated to talking face.
Lightricks/LTX-Video
Official repository for LTX-Video
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
brownvc/R3GAN
Code for NeurIPS 2024 paper - The GAN is dead; long live the GAN! A Modern Baseline GAN - by Huang et al.
abinthomasonline/repo2txt
Web-based tool converts GitHub repository contents into a single formatted text file
OpenBMB/MiniCPM-o
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
TIGER-AI-Lab/AnyV2V
Code and data for "AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks" (TMLR 2024)
microsoft/TRELLIS
Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation".
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
pkunlp-icler/FastV
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
dvlab-research/VisionZip
Official repository for VisionZip (CVPR 2025)
bytedance/1d-tokenizer
This repo contains the code for 1D tokenizer and generator