Sarah-Xing

Speech recognition, Speaker recognition, Speech Diarization, Machine Learning, LLM

Sarah-Xing's Stars

rosinality/glow-pytorch
PyTorch implementation of Glow
Language:Python52395
MontrealCorpusTools/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
Language:Python1.4k252
facebookresearch/AudioDec
An Open-source Streaming High-fidelity Neural Audio Codec
Language:Python45121
jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
Language:Python92453
y-ren16/TiCodec
Language:Python604
hojonathanho/diffusion
Denoising Diffusion Probabilistic Models
Language:Python4k382
dongzhuoyao/awesome-flow-matching
A summary of related works about flow matching, stochastic interpolants
35913
jmtomczak/intro_dgm
"Deep Generative Modeling": Introductory Examples
Language:Jupyter Notebook1.1k178
facebookresearch/flow_matching
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Language:Python1.6k60
labmlai/annotated_deep_learning_paper_implementations
🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Language:Python57.4k5.9k
ddlBoJack/Awesome-Speech-Language-Model
Paper, Code and Resources for Speech Language Model and End2End Speech Dialogue System.
14312
lucidrains/voicebox-pytorch
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
Language:Python62353
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Language:Python8.3k1.1k
microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Language:Python1.2k116
shivammehta25/Matcha-TTS
[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
Language:Jupyter Notebook794100
Vincentqyw/cv-arxiv-daily
🎓Automatically Update CV Papers Daily using Github Actions (Update Every 2days)
Language:Python993367
lucidrains/e2-tts-pytorch
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
Language:Python39636
supertone-inc/super-monotonic-align
Language:Python1329
Plachtaa/VITS-fast-fine-tuning
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Language:Python4.8k718
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python32.6k5k
jik876/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Language:Python2k513
bytedance/SALMONN
SALMONN: Speech Audio Language Music Open Neural Network
Language:Python1.1k85
gpt-omni/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Language:Python1.7k200
facebookresearch/spiritlm
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
Language:Python85256
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python21.2k2.2k
JunityZhan/Understanding-VITS
In this repository, you will learn how code works in VITS(Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech) in Jupyter Notebooks, including normalizing data, training process, inference process, and model's details.
Language:Jupyter Notebook16425
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Language:Python1.3k103
lhotse-speech/lhotse
Tools for handling speech data in machine learning projects.
Language:Python966221
jaywalnut310/vits
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Language:Python7k1.3k
ming024/FastSpeech2
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
Language:Python1.9k543

Sarah-Xing

Sarah-Xing's Stars

rosinality/glow-pytorch

MontrealCorpusTools/Montreal-Forced-Aligner

facebookresearch/AudioDec

jishengpeng/WavTokenizer

y-ren16/TiCodec

hojonathanho/diffusion

dongzhuoyao/awesome-flow-matching

jmtomczak/intro_dgm

facebookresearch/flow_matching

labmlai/annotated_deep_learning_paper_implementations

ddlBoJack/Awesome-Speech-Language-Model

lucidrains/voicebox-pytorch

SWivid/F5-TTS

microsoft/SpeechT5

shivammehta25/Matcha-TTS

Vincentqyw/cv-arxiv-daily

lucidrains/e2-tts-pytorch

supertone-inc/super-monotonic-align

Plachtaa/VITS-fast-fine-tuning

vllm-project/vllm

jik876/hifi-gan

bytedance/SALMONN

gpt-omni/mini-omni2

facebookresearch/spiritlm

facebookresearch/audiocraft

JunityZhan/Understanding-VITS

lucidrains/naturalspeech2-pytorch

lhotse-speech/lhotse

jaywalnut310/vits

ming024/FastSpeech2