An implementation of Tacotron speech synthesis in TensorFlow (Modified to use for Nepali Dataset.)
-
Install Python 3.
-
Install the latest version of TensorFlow for your platform. For better performance, install with GPU support if it's available. This code works with TensorFlow 1.3 and later.
-
Install requirements:
pip install -r requirements.txt
-
Download and unpack a model: FOR NEPALI: Download NEPALI PRETRAINED MODEL on 30, 50 and 75k iterations. For the better results use the model checkpoint trained on 75k iteration.
-
Run the demo server: FOR NEPALI DEMO:
python3 demo_server.py --checkpoint <full_path_to_pretrained_model>/model.ckpt-30000
-
Point your browser at localhost:9000
- Type what you want to synthesize
- FOR NEPALI USE NEPALI UNICODE SENTENCES.
-
Download a speech dataset.
- ne_np_female (Creative Commons Attribution Share-Alike)
-
Unpack the dataset into
~/PycharmProjects/tacotron
For nepali TTS the folder should look like(the extracted folder is renamed tonepali
for simplicity)tacotron |- nepali |- line_index.tsv |- wavs
-
Preprocess the data
python3 preprocess.py --dataset nepali
- For nepali dataset hparams.py is set to
cleaners='transliteration_cleaners'
. If you are using other dataset, change it to default.
- For nepali dataset hparams.py is set to
-
Train a model
python3 train.py
Tunable hyperparameters are found in hparams.py. You can adjust these at the command line using the
--hparams
flag, for example--hparams="batch_size=16,outputs_per_step=2"
. Hyperparameters should generally be set to the same values at both training and eval time.- For nepali dataset use
python3 train.py --hparams="max_iters=300"
. SeeNotes and Common Issues
for details.
- For nepali dataset use
-
Monitor with Tensorboard (optional)
tensorboard --logdir ~/PycharmProjects/tacotron/logs-tacotron
The trainer dumps audio and alignments every 1000 steps. You can find these in
~/PycharmProjects/tacotron/logs-tacotron
. -
Synthesize from a checkpoint
python3 demo_server.py --checkpoint ~/PycharmProjects/tacotron/logs-tacotron/model.ckpt-50000
Replace "50000" with the checkpoint number that you want to use, then open a browser to
localhost:9000
and type what you want to speak. Alternately, you can run eval.py at the command line:python3 eval.py --checkpoint ~/PycharmProjects/tacotron/logs-tacotron/model.ckpt-50000
If you set the
--hparams
flag when training, set the same value here.
-
During eval and training, audio length is limited to
max_iters * outputs_per_step * frame_shift_ms
milliseconds. With the defaults (max_iters=200, outputs_per_step=5, frame_shift_ms=12.5), this is 12.5 seconds.If your training examples are longer, you will see an error like this:
Incompatible shapes: [32,1340,80] vs. [32,1000,80]
To fix this, you can set a larger value of
max_iters
by passing--hparams="max_iters=300"
to train.py (replace "300" with a value based on how long your audio is and the formula above). -
In this fork the basedir should be noted as
~/PycharmProjects/tacotron
, please modify path location according to your cloning dir.