/GST-Tacotron-v2

PyTorch implementation of Style Tokens

Primary LanguagePythonMIT LicenseMIT

Tacotron 2

A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis.

Dataset

Aishell Dataset, containing 400 speakers and over 170 hours of Mandarin speech data.

Dependency

  • Python 3.5.2
  • PyTorch 1.0.0

Usage

Data Pre-processing

Extract data_aishell.tgz:

$ python extract.py

Extract wav files into train/dev/test folders:

$ cd data/data_aishell/wav
$ find . -name '*.tar.gz' -execdir tar -xzvf '{}' \;

Scan transcript data, generate features:

$ python pre_process.py

Now the folder structure under data folder is sth. like:

data/
    data_aishell.tgz
    data_aishell/
        transcript/
            aishell_transcript_v0.8.txt
        wav/
            train/
            dev/
            test/
    aishell.pickle

Train

$ python train.py

If you want to visualize during training, run in your terminal:

$ tensorboard --logdir runs

Demo

Generate mel-spectrogram for text "相对论直接和间接的催生了量子力学的诞生 也为研究微观世界的高速运动确立了全新的数学模型"

$ python demo.py

image