/ppg2ppg

Zero-Shot Foreign Accent Conversion without a Native Reference

Primary LanguagePythonApache License 2.0Apache-2.0

Zero-Shot Foreign Accent Conversion without a Native Reference

Code for this paper Zero-Shot Foreign Accent Conversion without a Native Reference

Waris Quamer, Anurag Das, John Levis, Evgeny Chukharev-Hudilainen, Ricardo Gutierrez-Osuna

**Code not longer maintained, but you can find alternate implementation here. (See Latent Space Conversion Method)

This is a TensorFlow + Pytorch implementation. This implementation is adapted from the Real Time Voice Clone implementation at https://github.com/CorentinJ/Real-Time-Voice-Cloning.

Installation

  1. Python 3.8
  • Install PyTorch (>=1.0.1).
  • Install Nvidia version of TensorFlow 1.15
  • Install ffmpeg.
  • Install Kaldi
  • Install PyKaldi
  • Run pip install -r requirements.txt to install the remaining necessary packages.
  • Download pretrained TDNN-F model, extract it, and set PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh to the pretrained model directory.

Dataset

  • Acoustic Model: LibriSpeech. Download pretrained TDNN-F acoustic model here.
    • You also need to set KALDI_ROOT and PRETRAIN_ROOT in kaldi_scripts/extract_features_kaldi.sh accordingly.
  • Speaker Encoder: LibriSpeech, see here for detailed training process.
  • Accent Encoder: Speech Accent Archive. You can use the subset that I collected here.
  • Synthesizer and Translator (i.e., Seq2seq model): ARCTIC and L2-ARCTIC. Please see here for a merged version.
  • Vocoder: LibriSpeech, see here for detailed training process.

All the pretrained the models are available here

Quick Start

See the inference script

Training

  • Use Kaldi to extract BNF for the reference L1 speaker
./kaldi_scripts/extract_features_kaldi.sh /path/to/L2-ARCTIC/BDL
  • Preprocessing
python synthesizer_preprocess_audio.py /path/to/L2-ARCTIC BDL /path/to/L2-ARCTIC/BDL/kaldi --out_dir=your_preprocess_output_dir
python synthesizer_preprocess_embeds.py your_preprocess_output_dir

python translator_preprocess_audio.py /path/to/L2-ARCTIC BDL /path/to/L2-ARCTIC/BDL/kaldi --out_dir=your_preprocess_output_dir
python translator_preprocess_embeds.py your_preprocess_output_dir
  • Training
python translator_train.py PPG2PPG_train your_preprocess_output_dir
python synthesizer_train.py Accetron_train your_preprocess_output_dir