hyzhan

Guangzhou

Pinned Repositories

auraloss
Collection of audio-focused loss functions in PyTorch
Language:Python0 1 00
caffe
Caffe: a fast open framework for deep learning.
Language:C++0 2 00
caffe-fast-rcnn
Caffe fork that supports Fast R-CNN
Language:C++0 2 00
Chinese_conversation_sentiment
A Chinese sentiment dataset may be useful for sentiment analysis.
0 1 00
ClariNet
A Pytorch Implementation of ClariNet
Language:Python0 2 00
cnn_graph
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Language:Jupyter Notebook0 2 00
code01
Language:Python0 1 00
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python0 0 00
deep-voice-conversion
Deep neural networks for voice conversion (voice style transfer) in Tensorflow
Language:Python0 1 00
denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Language:Python0 1 00

hyzhan's Repositories

hyzhan/auraloss
Collection of audio-focused loss functions in PyTorch
Language:Python0 1 00
hyzhan/code01
Language:Python0 1 00
hyzhan/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python0 0 00
hyzhan/denoiser
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.
Language:Python0 1 00
hyzhan/g2pM
A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset
Language:Python1 0
hyzhan/gaze-tracking-pipeline
full camera-to-screen gaze tracking pipeline
Language:Python0 0
hyzhan/gcn
Implementation of Graph Convolutional Networks in TensorFlow
Language:Python2 0
hyzhan/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python0 0
hyzhan/GPT2-Chinese
Chinese version of GPT2 training code, using BERT tokenizer.
Language:Python0 0
hyzhan/grafx
GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch
Language:Python0 0
hyzhan/hyzhan.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Language:JavaScript0 0
hyzhan/ICASSP2022
Language:HTML2 0
hyzhan/Interspeech2021
Interspeech2021
Language:HTML2 0
hyzhan/lightconv_pt
lightconv_layer fairseq
hyzhan/Montreal-Forced-Aligner
Command line utility for forced alignment using Kaldi
Language:Python0 0
hyzhan/NAC-TTS
Language:HTML1 0
hyzhan/NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
hyzhan/phonological-features
Materials accompanying the paper "Phonological features for 0-shot multilingual speech synthesis"
Language:Python1 0
hyzhan/PyTorch-BigGraph
Software used for generating embeddings from large-scale graph-structured data.
Language:Python2 0
hyzhan/Real-Time-Voice-Cloning
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Language:Python
hyzhan/spleeter
Deezer source separation library including pretrained models.
Language:Python1 0
hyzhan/StyleDubber
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
Language:Python0 0
hyzhan/TTS_TFLite
This repository is a collection of TTS Models in TFLite
Language:Jupyter Notebook1 0
hyzhan/ubisoft-laforge-daft-exprt
Language:Python1 0
hyzhan/vall-e
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech)
Language:Python0 0
hyzhan/vits_chinese
Best TTS based on BERT and VITS with some Natural Speech Features Of Microsoft
hyzhan/voice-filter
A unofficial Pytorch implementation of Google's VoiceFilter
Language:Python1 0
hyzhan/voice_conversion
Language:Python1 0
hyzhan/w2v2-how-to
How to use our public wav2vec2 dimensional emotion model
Language:Jupyter Notebook0 0
hyzhan/WaveRNN-Pytorch
Fatcord's Alternative WaveRNN (Faster training)
Language:Python1 0