Pinned Repositories
auraloss
Collection of audio-focused loss functions in PyTorch
caffe
Caffe: a fast open framework for deep learning.
caffe-fast-rcnn
Caffe fork that supports Fast R-CNN
Chinese_conversation_sentiment
A Chinese sentiment dataset may be useful for sentiment analysis.
ClariNet
A Pytorch Implementation of ClariNet
cnn_graph
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
code01
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
deep-voice-conversion
Deep neural networks for voice conversion (voice style transfer) in Tensorflow
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
hyzhan's Repositories
hyzhan/auraloss
Collection of audio-focused loss functions in PyTorch
hyzhan/code01
hyzhan/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
hyzhan/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
hyzhan/g2pM
A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
hyzhan/gaze-tracking-pipeline
full camera-to-screen gaze tracking pipeline
hyzhan/gcn
Implementation of Graph Convolutional Networks in TensorFlow
hyzhan/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
hyzhan/GPT2-Chinese
Chinese version of GPT2 training code, using BERT tokenizer.
hyzhan/grafx
GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
hyzhan/hyzhan.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
hyzhan/ICASSP2022
hyzhan/Interspeech2021
Interspeech2021
hyzhan/lightconv_pt
lightconv_layer fairseq
hyzhan/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
hyzhan/NAC-TTS
hyzhan/NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
hyzhan/phonological-features
Materials accompanying the paper "Phonological features for 0-shot multilingual speech synthesis"
hyzhan/PyTorch-BigGraph
Software used for generating embeddings from large-scale graph-structured data.
hyzhan/Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
hyzhan/spleeter
Deezer source separation library including pretrained models.
hyzhan/StyleDubber
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
hyzhan/TTS_TFLite
This repository is a collection of TTS Models in TFLite
hyzhan/ubisoft-laforge-daft-exprt
hyzhan/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech)
hyzhan/vits_chinese
Best TTS based on BERT and VITS with some Natural Speech Features Of Microsoft
hyzhan/voice-filter
A unofficial Pytorch implementation of Google's VoiceFilter
hyzhan/voice_conversion
hyzhan/w2v2-how-to
How to use our public wav2vec2 dimensional emotion model
hyzhan/WaveRNN-Pytorch
Fatcord's Alternative WaveRNN (Faster training)