choihk6610's Stars
Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
clovaai/ClovaCall
ClovaCall dataset and Pytorch LAS baseline code (Interspeech 2020)
YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
facebookresearch/flow_matching
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
modelscope/ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
BytedanceSpeech/seed-tts-eval
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
fakufaku/fast_bss_eval
A fast implementation of bss_eval metrics for blind source separation
snakers4/silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
nomadkaraoke/python-audio-separator
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
siyuhuang/QuantArt
Official PyTorch implementation of QuantArt (CVPR2023)
haoheliu/versatile_audio_super_resolution
Versatile audio super resolution (any -> 48kHz) with AudioSR.
kyutai-labs/moshi
karpathy/LLM101n
LLM101n: Let's build a Storyteller
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
fishaudio/fish-speech
SOTA Open Source TTS
rtqichen/torchdiffeq
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
NeuralVox/OpenPhonemizer
An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPL phonemizer.
facebookresearch/fastText
Library for fast text representation and classification.
jinwonkim93/transformer-tts
Rikorose/DeepFilterNet
Noise supression using deep filtering
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
facebookresearch/svoice
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
MoonInTheRiver/DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
ex3ndr/supervoice-vall-e-2
VALL-E 2 reproduction
joonaskalda/PixIT
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings" published at Odyssey 2024