wangyang2014

wangyang2014's Stars

ZheC/Realtime_Multi-Person_Pose_Estimation
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)
Language:Jupyter Notebook5.1k1.4k
affige/genmusic_demo_list
a list of demo websites for automatic music generation research
64844
amirbar/speech2gesture
code for training the models from the paper "Learning Individual Styles of Conversational Gestures"
Language:Python38044
JishengBai/AudioSetCaps
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
Language:Python1092
Anynoumsiccv9970/G2P-DDM
Language:Python92
jishengpeng/WavChat
A Survey of Spoken Dialogue Models (60 pages)
24414
MatthewCYM/VoiceBench
VoiceBench: Benchmarking LLM-Based Voice Assistants
Language:Python854
bytedance/X-Portrait
Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
Language:Python44136
PantoMatrix/PantoMatrix
PantoMatrix: Generating Face and Body Animation from Speech
Language:Python1k181
TencentGameMate/chinese_speech_pretrain
chinese speech pretrained models
Language:Shell1.1k89
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Language:Python71154
Winn1y/Awesome-Human-Motion-Video-Generation
Human Motion Video Generation: A Survey (https://www.techrxiv.org/users/836049/articles/1228135-human-motion-video-generation-a-survey)
1334
feizc/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
Language:Python1.6k129
dongxiaoke/VASA-1
Implementation of VASA-1
157
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
Language:Python3.3k255
Stability-AI/generative-models
Generative Models by Stability AI
Language:Python25k2.8k
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Language:Python8.8k1.2k
yzGuu830/efficient-speech-codec
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Language:Jupyter Notebook984
AdolfVonKleist/Phonetisaurus
Phonetisaurus G2P
Language:Shell457122
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
Language:Python12611
lucidrains/MIMO-pytorch
Pytorch implementation of MIMO, Controllable Character Video Synthesis with Spatial Decomposed Modeling, from Alibaba Intelligence Group
Language:Python1296
Kyubyong/g2p
g2p: English Grapheme To Phoneme Conversion
Language:Python829129
unclecode/crawl4ai
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper
Language:Python23.7k1.7k
lifeiteng/OmniSenseVoice
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
Language:Python78730
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language:Python4.7k453
lucidrains/rectified-flow-pytorch
Implementation of rectified flow and some of its followup research / improvements in Pytorch
Language:Python2269
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Language:Python37.5k4.6k
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Language:Jupyter Notebook815104
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
6.1k337
kyutai-labs/moshi
Language:Python7.1k556