/torch-nansypp

NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis

Primary LanguageJupyter NotebookMIT LicenseMIT

torch-nansypp

Torch implementation of NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis, [openreview]

TODO

  1. breathiness perturbation
  2. DEMAND based noise addition

Requirements

Tested in python 3.7.9 conda environment.

Usage

Initialize the submodule.

git submodule init --update

Download LibriTTS[openslr:60], LibriSpeech[openslr:12] and VCTK[official] datasets.

Dump the dataset for training.

python -m speechset.utils.dump \
    --out-dir ./datasets/dumped

To train model, run train.py

python train.py

To start to train from previous checkpoint, --load-epoch is available.

python train.py \
    --load-epoch 20 \
    --config ./ckpt/t1.json

Checkpoint will be written on TrainConfig.ckpt, tensorboard summary on TrainConfig.log.

tensorboard --logdir ./log

[TODO] To inference model, run inference.py

[TODO] Pretrained checkpoints will be relased on releases.

To use pretrained model, download files and unzip it. Followings are sample script.

from nansypp import Nansypp

ckpt = torch.load('t1_200.ckpt', map_location='cpu')
nansypp = Nansypp.load(ckpt)
nansy.eval()

[TODO] Learning curve and Figures

[TODO] Samples