/SkipVQVC

An implementation of SkipVQVC with various settings.

Primary LanguageJupyter Notebook

SkipVQVC

Implementation of SkipVQVC with variant settings. Skip connection is an powerful technique in deep learning. However, in auto-encoder based voice conversion(VC) domain, skip connection is often no-used. Skip-connection cause model learning too fast, and overfitting on reconstruction, and such a model cannot fullfill VC anymore. In this paper, we discuss how quantization can form a strong bottleneck that skip-connection VC can fullfilled.

preprocessing

python preprocessing.py [input_dir (VCTK/wav48)] [output_dir npy dir]

File architecture

# File 
- SkipVQVC
  |- logger (some utlis used in tensorboard)
  |  |.
  |
  |- trainer (differnt trainer have different properties)
  |  |- train_normal.py
  |  |- train_rhythm.py (split speech to rhythm fator, shoud use vqvc+_rhythm model)
  |  |- train_mean_std.py (train with input normalized by mean and std)
  |
  |- model (different models like normal, speaker vae, rhythm, )
  | |- .
  | |- .
  |
  |- utils

Training config

  • -train_dir is your training dir
  • -test_dir is your testing dir (unseen speakers)
  • -m which model do you want in model/* (for example: vqvc+)
  • -n number of vectors in codebook
  • -ch channels in encoder and decoder
  • -t which trainer do you want in trainer/* (for example: train_normal)
  • --load_checkpoint, if you want to load checkpoint(if it is in the checkpoint dir, for example: True)

checkpoint and output dir is auto generated by you model, trainer n_embed and channel. Load checkpoint it auto load the files match its setting.

Example

python train.py -train_dir /homes/aa/mel/mel.melgan -m vqvc+ -n 128 -ch 128 -t train_normal
--> "Saving model and optimizer state at iteration 0 to checkpoint/vqvc+_n128_ch128_train_normal/gen"
--> "Saving model and optimizer state at iteration 100 to checkpoint/vqvc+_n128_ch128_train_normal/gen"

Tensorboard

tensorboard --logdir output/vqvc+_n128_ch128_train_normal

The Whole model are still in investigation to find the best parameters.

# if you want to recover the result in papers.
python train.py -train_dir your-path-to-npy-dir -m vqvc+ -n 64 -ch 64 -t train_normal

# if you want to train with rhythm information ( adjust rhythm )
python train.py -train_dir your-path-to-npy-dir -m vqvc+_rhythm -n 128 -ch 128 -t train_rhythm

# if you find that normal trainging is not very good for one-shot, you can train resample. 
#It resample the quantized code which eliminate more speaker infomration from content

python train.py -train_dir your-path-to-npy-dir -m vqvc+_resample -n 512 -ch 512 -t train_normal

# We find that normalization on embeeding space imporve the result, you can try this
python train.py -train_dir your-path-to-npy-dir -m vqvc+ -n 64 -ch 512 -t train_simple_normalize


# Still in investigation...., speaker quantize <--> cav on speaker embedding

Some details

All model is wrap by vq_model(), details can be seen in model/vqvc*
All trainer is wrap by train_() , details can be seen in trainer/train*