/emotionalVC

VAE-WGAN for emotional voice conversion

Primary LanguagePythonMIT LicenseMIT

Implementation of VAWGAN (conditional VAE-WGAN) for non-parallel voice conversion in PyTorch, re-purposed for emotional voice conversion on SAVEE dataset.

VAWGAN Paper

Dependency

  • Python 3.6
    • tensorflow >= 1.5.0
    • PyTorch >= 0.4.0
    • PyWorld
    • librosa
    • soundfile

Usage

# feature extraction
python analyzer.py \
--dir_to_wav dataset/savee/wav \
--dir_to_bin dataset/savee/bin

# collect stats
python build.py 
--train_file_pattern "dataset/savee/bin/*/*.bin" \
--corpus_name savee

# training
python train.py architecture-vawgan-savee.json savee

# conversion
python convert.py architecture-vawgan-savee.json savee THap

Adjust model and training parameters in architecture JSON file.

Note

  1. It can only do 1-to-1 VC.
  2. Setting epoch_vawgan in the architecture to 0 results in 1-to-1 VAE.