A PyTorch implementation of Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis.
Aishell Dataset, containing 400 speakers and over 170 hours of Mandarin speech data.
- Python 3.5.2
- PyTorch 1.0.0
Extract data_aishell.tgz:
$ python extract.py
Extract wav files into train/dev/test folders:
$ cd data/data_aishell/wav
$ find . -name '*.tar.gz' -execdir tar -xzvf '{}' \;
Scan transcript data, generate features:
$ python pre_process.py
Now the folder structure under data folder is sth. like:
data/ data_aishell.tgz data_aishell/ transcript/ aishell_transcript_v0.8.txt wav/ train/ dev/ test/ aishell.pickle
$ python train.py
If you want to visualize during training, run in your terminal:
$ tensorboard --logdir runs
Generate mel-spectrogram for text "相对论直接和间接的催生了量子力学的诞生 也为研究微观世界的高速运动确立了全新的数学模型"
$ python demo.py