Sarah-Xing's Stars
rosinality/glow-pytorch
PyTorch implementation of Glow
MontrealCorpusTools/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
facebookresearch/AudioDec
An Open-source Streaming High-fidelity Neural Audio Codec
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
y-ren16/TiCodec
hojonathanho/diffusion
Denoising Diffusion Probabilistic Models
dongzhuoyao/awesome-flow-matching
A summary of related works about flow matching, stochastic interpolants
jmtomczak/intro_dgm
"Deep Generative Modeling": Introductory Examples
facebookresearch/flow_matching
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
labmlai/annotated_deep_learning_paper_implementations
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
ddlBoJack/Awesome-Speech-Language-Model
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
lucidrains/voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Vincentqyw/cv-arxiv-daily
🎓Automatically Update CV Papers Daily using Github Actions (Update Every 2days)
lucidrains/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
supertone-inc/super-monotonic-align
Plachtaa/VITS-fast-fine-tuning
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
jik876/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
gpt-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
facebookresearch/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
JunityZhan/Understanding-VITS
In this repository, you will learn how code works in VITS(Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) in Jupyter Notebooks, including normalizing data, training process, inference process, and model's details.
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
lhotse-speech/lhotse
Tools for handling speech data in machine learning projects.
jaywalnut310/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
ming024/FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"