choihk6610

choihk6610's Stars

Labbeti/aac-datasets
Audio Captioning datasets for PyTorch.
Language:Python1106
clovaai/ClovaCall
ClovaCall dataset and Pytorch LAS baseline code (Interspeech 2020)
Language:Python22056
YuanGongND/ltu
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
Language:Python40038
facebookresearch/flow_matching
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
Language:Python1.7k67
modelscope/ClearerVoice-Studio
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Language:Python1.9k138
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python13.3k1.1k
BytedanceSpeech/seed-tts-eval
Language:Python1.1k109
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language:Python4.7k453
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Language:Jupyter Notebook4k355
fakufaku/fast_bss_eval
A fast implementation of bss_eval metrics for blind source separation
Language:Python1328
snakers4/silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Language:Jupyter Notebook5.1k324
nomadkaraoke/python-audio-separator
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
Language:Python57194
SWivid/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Language:Python8.8k1.1k
xingchensong/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
Language:Python22430
siyuhuang/QuantArt
Official PyTorch implementation of QuantArt (CVPR2023)
Language:Python1006
haoheliu/versatile_audio_super_resolution
Versatile audio super resolution (any -> 48kHz) with AudioSR.
Language:Python1.2k126
kyutai-labs/moshi
Language:Python7.1k555
karpathy/LLM101n
LLM101n: Let's build a Storyteller
30.9k1.7k
webdataset/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Language:Python2.4k193
fishaudio/fish-speech
SOTA Open Source TTS
Language:Python18.2k1.4k
rtqichen/torchdiffeq
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.
Language:Python5.7k943
NeuralVox/OpenPhonemizer
An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GPL phonemizer.
Language:Python885
facebookresearch/fastText
Library for fast text representation and classification.
Language:HTML26k4.7k
jinwonkim93/transformer-tts
Language:Python1
Rikorose/DeepFilterNet
Noise supression using deep filtering
Language:Python2.6k244
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Language:Jupyter Notebook36.6k4.3k
facebookresearch/svoice
We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.
Language:Python1.3k186
MoonInTheRiver/DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
Language:Python4.4k722
ex3ndr/supervoice-vall-e-2
VALL-E 2 reproduction
Language:Jupyter Notebook10814
joonaskalda/PixIT
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings" published at Odyssey 2024
Language:Python633