zhangjiewu's Stars
PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Stability-AI/StableCascade
Official Code for Stable Cascade
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
AILab-CVC/YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
AILab-CVC/VideoCrafter
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
Doubiiu/DynamiCrafter
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
IDEA-Research/T-Rex
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
PixArt-alpha/PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
NUS-HPC-AI-Lab/OpenDiT
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
baofff/U-ViT
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
NUS-HPC-AI-Lab/Neural-Network-Parameter-Diffusion
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
chuanyangjin/fast-DiT
Fast Diffusion Models with Transformers
willisma/SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
iejMac/video2dataset
Easily create large video dataset from video urls
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Zhen-Dong/Magic-Me
Codes for ID-Specific Video Customized Diffusion
showlab/DragAnything
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Anima-Lab/MaskDiT
Code for Fast Training of Diffusion Models with Masked Transformers
BraveGroup/Drive-WM
[CVPR 2024] A world model for autonomous driving.
Q-Future/Q-Align
â‘¢[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
showlab/Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
zhaohengyuan1/Genixer
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
sayakpaul/single-video-curation-svd
Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.
nguyentthong/video-language-understanding
[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Yanqing0327/MLLMs-Augmented
The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》