zhangjiewu

PhD student @ NUS

National University of SingaporeSingapore

zhangjiewu's Stars

PKU-YuanGroup/Open-Sora-Plan
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Language:Python11.6k 153 3561k
facebookresearch/segment-anything-2
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Language:Jupyter Notebook11.1k 64 259944
Stability-AI/StableCascade
Official Code for Stable Cascade
Language:Jupyter Notebook6.6k 61 123533
facebookresearch/DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Language:Python6.4k 44 81578
AILab-CVC/YOLO-World
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Language:Python4.7k 40 467460
AILab-CVC/VideoCrafter
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Language:Python4.6k 71 84342
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Language:Python4.3k 116 83321
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
Language:Python3.2k 37 153254
Doubiiu/DynamiCrafter
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Language:Python2.6k 33 136210
IDEA-Research/T-Rex
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Language:Python2.3k 38 86149
PixArt-alpha/PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Language:Python1.7k 38 12584
NUS-HPC-AI-Lab/OpenDiT
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
Language:Python1.4k 23 6093
baofff/U-ViT
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
Language:Jupyter Notebook927 12 2861
NUS-HPC-AI-Lab/Neural-Network-Parameter-Diffusion
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
Language:Python840 19 2543
chuanyangjin/fast-DiT
Fast Diffusion Models with Transformers
Language:Python748 6 1597
willisma/SiT
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
Language:Python696 9 2738
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Language:Python601 9 2324
lucidrains/magvit2-pytorch
Implementation of MagViT2 Tokenizer in Pytorch
Language:Python565 28 3534
iejMac/video2dataset
Easily create large video dataset from video urls
Language:Python552 9 15665
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
Language:Jupyter Notebook543 14 3929
Zhen-Dong/Magic-Me
Codes for ID-Specific Video Customized Diffusion
Language:Python465 14 1338
showlab/DragAnything
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Language:Python438 16 2415
Anima-Lab/MaskDiT
Code for Fast Training of Diffusion Models with Masked Transformers
Language:Python377 13 1914
BraveGroup/Drive-WM
[CVPR 2024] A world model for autonomous driving.
Language:Python313 22 57
Q-Future/Q-Align
③[ICML2024] [IQA, IAA, VQA] All-in-one Foundation Model for visual scoring. Can efficiently fine-tune to downstream datasets.
Language:Python301 2 3721
showlab/Awesome-GUI-Agent
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
297 7 312
zhaohengyuan1/Genixer
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
Language:Python108 3 00
sayakpaul/single-video-curation-svd
Educational repository for applying the main video data curation techniques presented in the Stable Video Diffusion paper.
Language:Jupyter Notebook81 3 17
nguyentthong/video-language-understanding
[ACL’24 Findings] Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
34 1 00
Yanqing0327/MLLMs-Augmented
The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》
Language:Python31 4 11

zhangjiewu

zhangjiewu's Stars

PKU-YuanGroup/Open-Sora-Plan

facebookresearch/segment-anything-2

Stability-AI/StableCascade

facebookresearch/DiT

AILab-CVC/YOLO-World

AILab-CVC/VideoCrafter

FoundationVision/VAR

MooreThreads/Moore-AnimateAnyone

Doubiiu/DynamiCrafter

IDEA-Research/T-Rex

PixArt-alpha/PixArt-sigma

NUS-HPC-AI-Lab/OpenDiT

baofff/U-ViT

NUS-HPC-AI-Lab/Neural-Network-Parameter-Diffusion

chuanyangjin/fast-DiT

willisma/SiT

mit-han-lab/distrifuser

lucidrains/magvit2-pytorch

iejMac/video2dataset

jy0205/LaVIT

Zhen-Dong/Magic-Me

showlab/DragAnything

Anima-Lab/MaskDiT

BraveGroup/Drive-WM

Q-Future/Q-Align

showlab/Awesome-GUI-Agent

zhaohengyuan1/Genixer

sayakpaul/single-video-curation-svd

nguyentthong/video-language-understanding

Yanqing0327/MLLMs-Augmented