Zero-Shot Foreign Accent Conversion without a Native Reference

Code for this paper Zero-Shot Foreign Accent Conversion without a Native Reference

Waris Quamer, Anurag Das, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna

**Code not longer maintained, but you can find alternate implementation here. (See Latent Space Conversion Method)

This is a TensorFlow + Pytorch implementation. This implementation is adapted from the Real Time Voice Clone implementation at https://github.com/CorentinJ/Real-Time-Voice-Cloning.

Installation

Python 3.8

Install PyTorch (>=1.0.1).
Install Nvidia version of TensorFlow 1.15
Install ffmpeg.
Install Kaldi
Install PyKaldi
Run pip install -r requirements.txt to install the remaining necessary packages.
Download pretrained TDNN-F model, extract it, and set PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh to the pretrained model directory.

Dataset

Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
- You also need to set KALDI_ROOT and PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh accordingly.
Speaker Encoder: LibriSpeech, see here for detailed training process.
Accent Encoder: Speech Accent Archive. You can use the subset that I collected here.
Synthesizer and Translator (i.e., Seq2seq model): ARCTIC and L2-ARCTIC. Please see here for a merged version.
Vocoder: LibriSpeech, see here for detailed training process.

All the pretrained the models are available here

Quick Start

See the inference script

Training

Use Kaldi to extract BNF for the reference L1 speaker

./kaldi_scripts/extract_features_kaldi.sh /path/to/L2-ARCTIC/BDL

Preprocessing

python synthesizer_preprocess_audio.py /path/to/L2-ARCTIC BDL /path/to/L2-ARCTIC/BDL/kaldi --out_dir=your_preprocess_output_dir
python synthesizer_preprocess_embeds.py your_preprocess_output_dir

python translator_preprocess_audio.py /path/to/L2-ARCTIC BDL /path/to/L2-ARCTIC/BDL/kaldi --out_dir=your_preprocess_output_dir
python translator_preprocess_embeds.py your_preprocess_output_dir

Training

python translator_train.py PPG2PPG_train your_preprocess_output_dir
python synthesizer_train.py Accetron_train your_preprocess_output_dir

warisqr007/ppg2ppg

Zero-Shot Foreign Accent Conversion without a Native Reference

Installation

Dataset

Quick Start

Training