wsstriving

Shanghai Jiao Tong University

wsstriving's Stars

openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python68.7k 575 08.1k
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Language:Python25.4k 197 4.1k5.3k
ShiqiYu/libfacedetection
An open source library for face detection in images. The face detection speed can reach 1000FPS.
Language:C++12.3k 531 3183k
facebookresearch/metaseq
Repo for external large-scale work
Language:Python6.5k 112 294724
pytorch/audio
Data manipulation and transformation for audio signal processing, powered by PyTorch
Language:Python2.5k 72 933644
lucidrains/vector-quantize-pytorch
Vector (and Scalar) Quantization, in Pytorch
Language:Python2.5k 30 120197
iver56/audiomentations
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
Language:Python1.8k 20 181187
bytedance/music_source_separation
Language:Python1.3k 26 64194
aliutkus/speechmetrics
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Language:Python894 23 33153
NVIDIA/BigVGAN
Official PyTorch implementation of BigVGAN (ICLR 2023)
Language:Python854 71 097
google/cld3
Language:C++778 34 63110
huawei-noah/Speech-Backbones
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Language:Jupyter Notebook557 23 29115
wq2012/SpectralCluster
Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
Language:Python508 19 4573
wenet-e2e/WeTextProcessing
Text Normalization & Inverse Text Normalization
Language:Python455 10 11267
Emotional-Text-to-Speech/dl-for-emo-tts
:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:
Language:Jupyter Notebook416 9 744
adobe-research/DeepAFx-ST
DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/
Language:Python360 12 945
mjhydri/BeatNet
BeatNet is state-of-the-art (Real-Time) and Offline joint music beat, downbeat, tempo, and meter tracking system using CRNN and particle filtering. (ISMIR 2021's paper implementation).
Language:Python324 8 2755
NVIDIA/radtts
Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
Language:Roff281 15 3040
vb000/Waveformer
A deep neural network architecture for low-latency audio processing
Language:Python281 6 434
bytedance/ParaGen
ParaGen is a PyTorch deep learning framework for parallel sequence generation.
Language:Python186 10 123
keonlee9420/Comprehensive-E2E-TTS
A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS
Language:Python143 11 519
fcaspe/ddx7
Differentiable FM Synthesis of Musical Instrument Sounds
Language:Python126 3 27
brentspell/torch-yin
Yin pitch estimator in PyTorch
Language:Python114 6 17
csukuangfj/kaldi-native-fbank
Kaldi-compatible online fbank extractor without external dependencies
Language:C++74 4 1120
thuhcsi/SpanPSP
Language:Python74 2 919
luferrer/DCA-PLDA
Discriminative Condition-Aware PLDA
Language:Python42 5 97
azraelkuan/repgan
RepVgg + HiFiGAN
Language:Python33 4 66
bsxfan/PSDA
Probabilistic Spherical Discriminant Analysis
Language:Python12 3 13
bsxfan/Toroidal-PSDA
A probabilistic scoring backend for length-normalized embeddings.
Language:Python10 6 01
azraelkuan/azraelkuan
2 1 00

wsstriving

wsstriving's Stars

openai/whisper

huggingface/diffusers

ShiqiYu/libfacedetection

facebookresearch/metaseq

pytorch/audio

lucidrains/vector-quantize-pytorch

iver56/audiomentations

bytedance/music_source_separation

aliutkus/speechmetrics

NVIDIA/BigVGAN

google/cld3

huawei-noah/Speech-Backbones

wq2012/SpectralCluster

wenet-e2e/WeTextProcessing

Emotional-Text-to-Speech/dl-for-emo-tts

adobe-research/DeepAFx-ST

mjhydri/BeatNet

NVIDIA/radtts

vb000/Waveformer

bytedance/ParaGen

keonlee9420/Comprehensive-E2E-TTS

fcaspe/ddx7

brentspell/torch-yin

csukuangfj/kaldi-native-fbank

thuhcsi/SpanPSP

luferrer/DCA-PLDA

azraelkuan/repgan

bsxfan/PSDA

bsxfan/Toroidal-PSDA

azraelkuan/azraelkuan