wangyang2014's Stars
ZheC/Realtime_Multi-Person_Pose_Estimation
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)
affige/genmusic_demo_list
a list of demo websites for automatic music generation research
amirbar/speech2gesture
code for training the models from the paper "Learning Individual Styles of Conversational Gestures"
JishengBai/AudioSetCaps
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
Anynoumsiccv9970/G2P-DDM
jishengpeng/WavChat
A Survey of Spoken Dialogue Models (60 pages)
MatthewCYM/VoiceBench
VoiceBench: Benchmarking LLM-Based Voice Assistants
bytedance/X-Portrait
Source code for the SIGGRAPH 2024 paper "X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention"
PantoMatrix/PantoMatrix
PantoMatrix: Generating Face and Body Animation from Speech
TencentGameMate/chinese_speech_pretrain
chinese speech pretrained models
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Winn1y/Awesome-Human-Motion-Video-Generation
Human Motion Video Generation: A Survey (https://www.techrxiv.org/users/836049/articles/1228135-human-motion-video-generation-a-survey)
feizc/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
dongxiaoke/VASA-1
Implementation of VASA-1
MooreThreads/Moore-AnimateAnyone
Character Animation (AnimateAnyone, Face Reenactment)
Stability-AI/generative-models
Generative Models by Stability AI
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
yzGuu830/efficient-speech-codec
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
AdolfVonKleist/Phonetisaurus
Phonetisaurus G2P
yangdongchao/RSTnet
Real-time Speech-Text Foundation Model Toolkit (wip)
lucidrains/MIMO-pytorch
Pytorch implementation of MIMO, Controllable Character Video Synthesis with Spatial Decomposed Modeling, from Alibaba Intelligence Group
Kyubyong/g2p
g2p: English Grapheme To Phoneme Conversion
unclecode/crawl4ai
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper
lifeiteng/OmniSenseVoice
Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
lucidrains/rectified-flow-pytorch
Implementation of rectified flow and some of its followup research / improvements in Pytorch
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
kyutai-labs/moshi