/Speaker-independent-emotional-voice-conversion-based-on-conditional-VAW-GAN-and-CWT

This is the implementation of our Interspeech 2020 paper "Converting anyone's emotion: towards speaker-independent emotional voice conversion".

Primary LanguagePython

Speaker-independent-emotional-voice-conversion-based-on-conditional-VAW-GAN-and-CWT

This is the implementation of the Interspeech 2020 paper "Converting anyone's emotion: towards speaker-independent emotional voice conversion". Please kindly cite our paper if you are using the codes.

Getting Started

Prerequisites

  • Ubuntu 16.04
  • Python 3.6
    • Tensorflow-gpu 1.5.0
    • PyWorld
    • librosa
    • soundfile
    • numpy 1.14.0
    • sklearn
    • glob
    • sprocket-vc
    • pycwt
    • scipy

Usage

  1. Prepare your dataset.
Please follow the file structure:

training_dir: ./data/wav/training_set/*/*.wav

evaluation_dir ./data/wav/evaluation_set/*/*.wav

For example: "./data/wav/training_set/Angry/0001.wav"
  1. Activate your virtual enviroment.
source activate [your env]
  1. Train VAW-GAN for prosody.
./train_f0.sh
# Remember to change the source and target dir in "architecture-vawgan-vcc2016.json"
  1. Train VAW-GAN for spectrum.
./train_sp.sh
# Remember to change the source and target dir in "architecture-vawgan-vcc2016.json"
  1. Generate the converted emotional speech.
./convert.sh

Note: The codes are based on VAW-GAN Voice Conversion: https://github.com/JeremyCCHsu/vae-npvc/tree/vawgan