Pinned Repositories
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
du2020dan
The implementation of our paper "Double adversarial networks for monaural speech enhancement" accepted by INTERSPEECH 2020.
du2020kws
This a small footprint robust KWS system which is based on the multi-conditional training, retraining and joint-training. This system includes a small-footprint KWS system and a small-footprint speech enhancement model. We also investigate a compress method for CNN and LSTM.
du2022sond
Speaker overlap-aware Neural Diarization
food_is_unstopped
Food is unstopped!!!! GO!
speech_feature_extractor
Some useful features of speech process, such as MFCC, gammatone filterbank, GFCC, spectrum(power spectrum and log-power spectrum), Amplitude Modulation Spectrum(AMS) and so on.
zhihaodu.github.io
ZhihaoDU's Repositories
ZhihaoDU/speech_feature_extractor
Some useful features of speech process, such as MFCC, gammatone filterbank, GFCC, spectrum(power spectrum and log-power spectrum), Amplitude Modulation Spectrum(AMS) and so on.
ZhihaoDU/du2022sond
Speaker overlap-aware Neural Diarization
ZhihaoDU/du2020dan
The implementation of our paper "Double adversarial networks for monaural speech enhancement" accepted by INTERSPEECH 2020.
ZhihaoDU/food_is_unstopped
Food is unstopped!!!! GO!
ZhihaoDU/du2020kws
This a small footprint robust KWS system which is based on the multi-conditional training, retraining and joint-training. This system includes a small-footprint KWS system and a small-footprint speech enhancement model. We also investigate a compress method for CNN and LSTM.
ZhihaoDU/zhihaodu.github.io
ZhihaoDU/attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
ZhihaoDU/improved-gan
code for the paper "Improved Techniques for Training GANs"
ZhihaoDU/asteroid
The PyTorch-based audio source separation toolkit for researchers || Pretrained models available
ZhihaoDU/awesome-diarization
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
ZhihaoDU/compare_gan
Compare GAN code.
ZhihaoDU/DDAEC
ZhihaoDU/demo_train
ZhihaoDU/espnet
End-to-End Speech Processing Toolkit
ZhihaoDU/FeatureEmbedding
Feature Embedding
ZhihaoDU/FloWaveNet
A Pytorch implementation of "FloWaveNet: A Generative Flow for Raw Audio"
ZhihaoDU/griffin_lim
Implementation of the Griffin and Lim algorithm to recover an audio signal from a magnitude-only spectrogram.
ZhihaoDU/huxpro.github.io
My Blog / Jekyll Themes / PWA
ZhihaoDU/kaldi_feat_enh
enhancement model for kaldi features
ZhihaoDU/neos_speech_utils
The speech utils may be useful for speech separation, speech enhancement, speech synthesis researchers. Enjoy it.
ZhihaoDU/progressive_growing_of_gans
Progressive Growing of GANs for Improved Quality, Stability, and Variation
ZhihaoDU/pytorch-CycleGAN-and-pix2pix
Image-to-image translation in PyTorch (e.g., horse2zebra, edges2cats, and more)
ZhihaoDU/pytorch-spectral-normalization-gan
Paper by Miyato et al. https://openreview.net/forum?id=B1QRgziT-
ZhihaoDU/stylegan
StyleGAN - Official TensorFlow Implementation
ZhihaoDU/tf-kaldi-speaker
Neural speaker recognition/verification system based on Kaldi and Tensorflow
ZhihaoDU/torch-two-sample
A PyTorch library for two-sample tests