wangyang2014's Stars
CMU-Perceptual-Computing-Lab/openpose
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
deepseek-ai/DeepSeek-V3
Tencent/HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
yangchris11/samurai
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
open-mmlab/mmpose
OpenMMLab Pose Estimation Toolbox and Benchmark.
ZheC/Realtime_Multi-Person_Pose_Estimation
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)
GuyTevet/motion-diffusion-model
The official PyTorch implementation of the paper "Human Motion Diffusion Model"
Lightricks/LTX-Video
Official repository for LTX-Video
antgroup/echomimic_v2
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Text-to-Audio/AudioLCM
PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.
kakaobrain/rq-vae-transformer
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
affige/genmusic_demo_list
a list of demo websites for automatic music generation research
ivcylc/OpenMusic
OpenMusic: SOTA Text-to-music (TTM) Generation
SeanChenxy/Hand3DResearch
yzhang2016/video-generation-survey
A reading list of video generation
hugofloresgarcia/vampnet
music generation with masked transformers!
ai4r/Gesture-Generation-from-Trimodal-Context
Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)
DiffPoseTalk/DiffPoseTalk
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
yhw-yhw/SHOW
This is the codebase for SHOW in Generating Holistic 3D Human Motion from Speech [CVPR2023],
YanzuoLu/CFLD
[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
ZhengdiYu/Arbitrary-Hands-3D-Reconstruction
🔥(CVPR 2023) ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
CNChTu/FCPE
Frank-ZY-Dou/EMDM
X-E-Speech/X-E-Speech-code
X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion
thuhcsi/S2G-MDDiffusion
TIGER-AI-Lab/VideoScore
official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]
PKBHY/WaveFM
WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching
ffxzh/KMTalk
[ECCV2024 offical]KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding