This repository provides an official PyTorch implementation for ConvS2S-VC.
ConvS2S-VC is a parallel many-to-many voice conversion (VC) method using a fully convolutional sequence-to-sequence (ConvS2S) model. The current version performs VC by first modifying the mel-spectrogram of input speech in accordance with a target speaker or style index, and then generating a waveform using a speaker-independent neural vocoder (Parallel WaveGAN or HiFi-GAN) from the converted mel-spectrogram.
Audio samples are available here.
-
Hirokazu Kameoka, Kou Tanaka, Takuhiro Kaneko, "FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion," arXiv:2104.06900 [cs.SD], 2021. [Paper]
-
Hirokazu Kameoka, Kou Tanaka, Damian Kwasny, Takuhiro Kaneko, Nobukatsu Hojo, "ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1849-1863, Jun. 2020. [Paper]
- See
requirements.txt
.
- Setup your training and test sets. The data structure should look like:
/path/to/dataset/training
├── spk_1
│ ├── utt1.wav
│ ...
├── spk_2
│ ├── utt1.wav
│ ...
└── spk_N
├── utt1.wav
...
/path/to/dataset/test
├── spk_1
│ ├── utt1.wav
│ ...
├── spk_2
│ ├── utt1.wav
│ ...
└── spk_N
├── utt1.wav
...
- Place a copy of the directory
parallel_wavegan
from https://github.com/kan-bayashi/ParallelWaveGAN inpwg/
. - HifiGAN models trained on several databases can be found here. Once these are downloaded, place them in
pwg/egs/
. Please contact me if you have any problems downloading. - Optionally, Parallel WaveGAN can be used instead for waveform generation. The trained models are available here. Once these are downloaded, place them in
pwg/egs/
.
To run all stages for model training, execute:
./recipes/run_train.sh [-g gpu] [-s stage] [-e exp_name]
-
Options:
-g: GPU device (default: -1) # -1 indicates CPU -s: Stage to start (0 or 1) # Stages 0 and 1 correspond to feature extraction and model training, respectively. -e: Experiment name (default: "exp1") # This name will be used at test time to specify which trained model to load.
-
Examples:
# To run the training from scratch with the default settings: ./recipes/run_train.sh # To skip the feature extraction stage: ./recipes/run_train.sh -s 1 # To set the gpu device to, say, 0: ./recipes/run_train.sh -g 0
See other scripts in recipes
for examples of training on different datasets.
To monitor the training process, use tensorboard:
tensorboard [--logdir log_path]
To perform conversion, execute:
./recipes/run_test.sh [-g gpu] [-e exp_name] [-c checkpoint] [-a attention_mode] [-v vocoder]
-
Options:
-g: GPU device (default: -1) # -1 indicates CPU -e: Experiment name (e.g., "exp1") -c: Model checkpoint to load (default: 0) # 0 indicates the newest model -a: Attention mode ("r", "f", or "d") # The modes in which the attention matrix is processed during conversion # r: raw attention (default) # f: windowed attention # d: exactly diagonal attention -v: Vocoder type ("hfg" or "pwg") # The type of vocoder used for waveform generation # hfg: HiFi-GAN (default) # pwg: Parallel WaveGAN
-
Examples:
# To perform conversion with the default settings: ./recipes/run_test.sh -g 0 -e exp1 # To enable attention windowing: ./recipes/run_test.sh -g 0 -e exp1 -a f # To use Parallel WaveGAN as an alternative for waveform generation: ./recipes/run_test.sh -g 0 -e exp1 -v pwg
If you find this work useful for your research, please cite our papers.
@Article{Kameoka2021arXiv_FastS2S-VC,
author={Hirokazu Kameoka and Kou Tanaka and Takuhiro Kaneko},
journal={arXiv:2104.06900 [cs.SD]},
title={FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion},
year={2021},
month=apr
}
@Article{Kameoka2020IEEETrans_ConvS2S-VC,
author={Hirokazu Kameoka and Kou Tanaka and Damian Kwasny and Takuhiro Kaneko and Nobukatsu Hojo},
journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
title={{ConvS2S-VC}: Fully Convolutional Sequence-to-Sequence Voice Conversion},
year={2020},
month=jun,
volume={28},
pages={1849-1863}
}
Hirokazu Kameoka (@kamepong)
E-mail: kame.hirokazu@gmail.com