This repo provides official PyTorch implementation of HN-uSFGAN, a high-fidelity and pitch controllable neural vocoder based on unifid source-filter networks.
HN-uSFGAN is an extended model of uSFGAN, and this repo includes the original uSFGAN implementation with some modifications.
For more information, please see the demo.
This repository is tested on the following condition.
- Ubuntu 20.04.3 LTS
- Titan RTX 3090 GPU
- Python 3.9.5
- Cuda 11.5
- CuDNN 8.1.1.33-1+cuda11.2
$ cd HN-UnifiedSourceFilterGAN
$ pip install -e .
Please refer to the Parallel WaveGAN repo for more details.
- egs: The folder for projects.
- egs/vctk: The folder of the VCTK project example.
- usfgan: The folder of the source codes.
In this repo, hyperparameters are managed using Hydra.
Hydra provides an easy way to dynamically create a hierarchical configuration by composition and override it through config files and the command line.
Make dataset and scp files denoting paths to each audio files according to your own dataset (E.g., egs/vctk/data/scp/vctk_train_24kHz.scp
).
Also, list files denoting paths to the features extracted in the next step are required (E.g., egs/vctk/data/scp/vctk_train_24kHz.list
).
Note that scp/list files for each training/validation/evaluation are needed.
# Move to the project directory
$ cd egs/vctk
# Extract acoustic features (F0, mel-cepstrum, and etc.)
# You can customize parameters according to usfgan/bin/config/extract_features.yaml
$ usfgan-extract-features audio=data/scp/vctk_all_24kHz.scp
# Compute statistics of training and testing data
$ usfgan-compute-statistics feats=data/scp/vctk_train_24kHz.list stats=data/stats/vctk_train_24kHz.joblib
# Train a model customizing the hyperparameters as you like
# The following setting of Parallel-HN-uSFGAN generator with HiFiGAN discriminator would show best performance
$ usfgan-train generator=parallel_hn_usfgan discriminator=hifigan train=hn_usfgan data=vctk_24kHz out_dir=exp/parallel_hn_usfgan
# Decode with natural acoustic features
$ usfgan-decode out_dir=exp/parallel-hn-usfgan/wav/600000steps checkpoint_path=exp/parallel-hn-usfgan/checkpoints/checkpoint-600000steps.pkl
# Decode with halved f0
$ usfgan-decode out_dir=exp/parallel-hn-usfgan/wav/600000steps checkpoint_path=exp/parallel-hn-usfgan/checkpoints/checkpoint-600000steps.pkl f0_factor=0.50
$ tensorboard --logdir exp
If you find the code is helpful, please cite the following article.
@inproceedings{yoneyama22_interspeech,
author={Reo Yoneyama and Yi-Chiao Wu and Tomoki Toda},
title={{Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation}},
year=2022,
booktitle={Proc. Interspeech 2022},
pages={848--852},
doi={10.21437/Interspeech.2022-11130}
}
Development:
Reo Yoneyama @ Nagoya University (@chomeyama)
E-mail: yoneyama.reo@g.sp.m.is.nagoya-u.ac.jp
Advisors:
Yi-Chiao Wu @ Nagoya University (@bigpon)
E-mail: yichiao.wu@g.sp.m.is.nagoya-u.ac.jp
Tomoki Toda @ Nagoya University
E-mail: tomoki@icts.nagoya-u.ac.jp