wsstriving's Stars
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
ShiqiYu/libfacedetection
An open source library for face detection in images. The face detection speed can reach 1000FPS.
facebookresearch/metaseq
Repo for external large-scale work
pytorch/audio
Data manipulation and transformation for audio signal processing, powered by PyTorch
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
iver56/audiomentations
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
bytedance/music_source_separation
aliutkus/speechmetrics
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
google/cld3
huawei-noah/Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
wq2012/SpectralCluster
Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
wenet-e2e/WeTextProcessing
Text Normalization & Inverse Text Normalization
Emotional-Text-to-Speech/dl-for-emo-tts
:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:
adobe-research/DeepAFx-ST
DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/
mjhydri/BeatNet
BeatNet is state-of-the-art (Real-Time) and Offline joint music beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. (ISMIR 2021's paper implementation).
NVIDIA/radtts
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
vb000/Waveformer
A deep neural network architecture for low-latency audio processing
bytedance/ParaGen
ParaGen is a PyTorch deep learning framework for parallel sequence generation.
keonlee9420/Comprehensive-E2E-TTS
A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS
fcaspe/ddx7
Differentiable FM Synthesis of Musical Instrument Sounds
brentspell/torch-yin
Yin pitch estimator in PyTorch
csukuangfj/kaldi-native-fbank
Kaldi-compatible online fbank extractor without external dependencies
thuhcsi/SpanPSP
luferrer/DCA-PLDA
Discriminative Condition-Aware PLDA
azraelkuan/repgan
RepVgg + HiFiGAN
bsxfan/PSDA
Probabilistic Spherical Discriminant Analysis
bsxfan/Toroidal-PSDA
A probabilistic scoring backend for length-normalized embeddings.
azraelkuan/azraelkuan