This is the implementation of the Interspeech 2020 paper "Converting anyone's emotion: towards speaker-independent emotional voice conversion". Please kindly cite our paper if you are using the codes.
- Ubuntu 16.04
- Python 3.6
- Tensorflow-gpu 1.5.0
- PyWorld
- librosa
- soundfile
- numpy 1.14.0
- sklearn
- glob
- sprocket-vc
- pycwt
- scipy
- Prepare your dataset.
Please follow the file structure:
training_dir: ./data/wav/training_set/*/*.wav
evaluation_dir ./data/wav/evaluation_set/*/*.wav
For example: "./data/wav/training_set/Angry/0001.wav"
- Activate your virtual enviroment.
source activate [your env]
- Train VAW-GAN for prosody.
./train_f0.sh
# Remember to change the source and target dir in "architecture-vawgan-vcc2016.json"
- Train VAW-GAN for spectrum.
./train_sp.sh
# Remember to change the source and target dir in "architecture-vawgan-vcc2016.json"
- Generate the converted emotional speech.
./convert.sh
Note: The codes are based on VAW-GAN Voice Conversion: https://github.com/JeremyCCHsu/vae-npvc/tree/vawgan