KunZhou9646/Speaker-independent-emotional-voice-conversion-based-on-conditional-VAW-GAN-and-CWT

This is the implementation of our Interspeech 2020 paper "Converting anyone's emotion: towards speaker-independent emotional voice conversion".

Python

Speaker-independent-emotional-voice-conversion-based-on-conditional-VAW-GAN-and-CWT

This is the implementation of the Interspeech 2020 paper "Converting anyone's emotion: towards speaker-independent emotional voice conversion". Please kindly cite our paper if you are using the codes.

Getting Started

Prerequisites

Ubuntu 16.04
Python 3.6
- Tensorflow-gpu 1.5.0
- PyWorld
- librosa
- soundfile
- numpy 1.14.0
- sklearn
- glob
- sprocket-vc
- pycwt
- scipy

Usage

Prepare your dataset.

Please follow the file structure:

training_dir: ./data/wav/training_set/*/*.wav

evaluation_dir ./data/wav/evaluation_set/*/*.wav

For example: "./data/wav/training_set/Angry/0001.wav"

Activate your virtual enviroment.

source activate [your env]

Train VAW-GAN for prosody.

./train_f0.sh
# Remember to change the source and target dir in "architecture-vawgan-vcc2016.json"

Train VAW-GAN for spectrum.

./train_sp.sh
# Remember to change the source and target dir in "architecture-vawgan-vcc2016.json"

Generate the converted emotional speech.

./convert.sh

Note: The codes are based on VAW-GAN Voice Conversion: https://github.com/JeremyCCHsu/vae-npvc/tree/vawgan