wxue-audio

wxue-audio's Stars

fishaudio/fish-speech
SOTA Open Source TTS
Language:Python18.1k1.4k
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python21.2k2.2k
hhguo/EA-SVC
An implement of "Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training"
Language:Python12433
mathigatti/DeepSingingSynthesizer
Extension of Sinsy-NG using deep learning models for voice conversion in order to synthesize good and realistic vocals.
Language:Python13
jim-schwoebel/voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
1.8k229
wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language:Python4.2k1.1k
Samsung/ONE
On-device Neural Engine
Language:C++445157
pyannote/pyannote-metrics
A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
Language:Python19734
cpuimage/rnnoise
Recurrent neural network for audio noise reduction
Language:C24290
xiph/rnnoise
Recurrent neural network for audio noise reduction
Language:C4.2k913
jzi040941/PercepNet
Unofficial implementation of PercepNet: A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech
Language:C++33594
facebookresearch/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Language:Python1.7k302
vimalmanohar/kaldi
Fork of the official kaldi.
Language:Shell223
asteroid-team/asteroid
The PyTorch-based audio source separation toolkit for researchers
Language:Python2.3k425
schmiph2/pysepm
Python implementation of performance metrics in Loizou's Speech Enhancement book
Language:Python39787
kaituoxu/Conv-TasNet
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).
Language:Python687156
naplab/Conv-TasNet
Language:Python29170
wangkenpu/Conv-TasNet-PyTorch
A PyTorch implementation of Conv-TasNet
Language:Python4611
WenzheLiu-Speech/awesome-speech-enhancement
speech enhancement\speech seperation\sound source localization
1.1k223
microsoft/DNS-Challenge
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
Language:Python1.1k417
JoergFranke/phoneme_recognition
Phoneme Recognition using RecNet
Language:Python9529
hyperconnect/TC-ResNet
Code for Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
Language:Python22256
ododoyo/EHNet
A neural network consist of cnn and lstm for speech enhancement
Language:Python2410
vbelz/Speech-enhancement
Deep learning for audio denoising
Language:Python670126
haoxiangsnr/Wave-U-Net-for-Speech-Enhancement
Implement Wave-U-Net by PyTorch, and migrate it to the speech enhancement.
Language:Python32667
eesungkim/Speech_Enhancement_DNN_NMF
Speech Enhancement based on DNN (Spectral-Mapping, TF-Masking), DNN-NMF, NMF
Language:Python17860
Tony607/Keras_Deep_Clustering
How to do Unsupervised Clustering with Keras
Language:Jupyter Notebook239137
Tony607/Keras-Trigger-Word
How to do Real Time Trigger Word Detection with Keras | DLology
Language:Jupyter Notebook16354
kaituoxu/TasNet
A PyTorch implementation of Time-domain Audio Separation Network (TasNet) with Permutation Invariant Training (PIT) for speech separation.
Language:Python11231
kaituoxu/Speech-Transformer
A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.
Language:Python776196