Tacotron

An implementation of Tacotron speech synthesis in TensorFlow (Modified to use for Nepali Dataset.)

Quick Start

Installing dependencies

Install Python 3.
Install the latest version of TensorFlow for your platform. For better performance, install with GPU support if it's available. This code works with TensorFlow 1.3 and later.
Install requirements:
```
pip install -r requirements.txt
```

Using a pre-trained model

Download and unpack a model: FOR NEPALI: Download NEPALI PRETRAINED MODEL on 30, 50 and 75k iterations. For the better results use the model checkpoint trained on 75k iteration.

Run the demo server: FOR NEPALI DEMO:

python3 demo_server.py --checkpoint <full_path_to_pretrained_model>/model.ckpt-30000

Point your browser at localhost:9000
- Type what you want to synthesize
- FOR NEPALI USE NEPALI UNICODE SENTENCES.

Training

Download a speech dataset.
- ne_np_female (Creative Commons Attribution Share-Alike)
Unpack the dataset into ~/PycharmProjects/tacotron For nepali TTS the folder should look like(the extracted folder is renamed to nepali for simplicity)
```
tacotron
|- nepali
   |- line_index.tsv
   |- wavs
```
Preprocess the data
```
python3 preprocess.py --dataset nepali
```
- For nepali dataset hparams.py is set to cleaners='transliteration_cleaners'. If you are using other dataset, change it to default.
Train a model
```
python3 train.py
```
Tunable hyperparameters are found in hparams.py. You can adjust these at the command line using the --hparams flag, for example --hparams="batch_size=16,outputs_per_step=2". Hyperparameters should generally be set to the same values at both training and eval time.
- For nepali dataset use python3 train.py --hparams="max_iters=300". See Notes and Common Issues for details.
Monitor with Tensorboard (optional)
```
tensorboard --logdir ~/PycharmProjects/tacotron/logs-tacotron
```
The trainer dumps audio and alignments every 1000 steps. You can find these in ~/PycharmProjects/tacotron/logs-tacotron.
Synthesize from a checkpoint
```
python3 demo_server.py --checkpoint ~/PycharmProjects/tacotron/logs-tacotron/model.ckpt-50000
```
Replace "50000" with the checkpoint number that you want to use, then open a browser to localhost:9000 and type what you want to speak. Alternately, you can run eval.py at the command line:
```
python3 eval.py --checkpoint ~/PycharmProjects/tacotron/logs-tacotron/model.ckpt-50000
```
If you set the --hparams flag when training, set the same value here.

Notes and Common Issues

During eval and training, audio length is limited to max_iters * outputs_per_step * frame_shift_ms milliseconds. With the defaults (max_iters=200, outputs_per_step=5, frame_shift_ms=12.5), this is 12.5 seconds.

If your training examples are longer, you will see an error like this: Incompatible shapes: [32,1340,80] vs. [32,1000,80]

To fix this, you can set a larger value of max_iters by passing --hparams="max_iters=300" to train.py (replace "300" with a value based on how long your audio is and the formula above).
In this fork the basedir should be noted as ~/PycharmProjects/tacotron, please modify path location according to your cloning dir.

silencedsre/tacotron

Tacotron

Quick Start

Installing dependencies

Using a pre-trained model

Training

Notes and Common Issues

For more information please refer to original implementation