This repository contains code to train phrasing models for Text-to-Speech systems. The models are trained using LibriTTS alignments avialable kan-bayashi/LibriTTSLabel. The train-clean-360
split is used for training, while the dev-clean
and test-clean
splits are used for validation and test respectively.
-
Download the dataset kan-bayashi/LibriTTSLabel
-
Preprocess the downloaded LibriTTS Label dataset and transform to a format suitable for the model
python utils/build_LibriTTS_label_dataset.py \ --raw_dataset_dir <Path to the downloaded dataset> \ --processed_dataset_dir <Output dir, where the processed dataset will be written>
-
Build vocabularies of words and tags from the processed dataset; for training word emebeddings from scratch
python utils/build_vocab_word_embeddings.py \ --data_dir <Directory containing the processed dataset>
Running this script will save vocabulary files
data_dir/vocab/words.txt
anddata_dir/vocab/tags.txt
containing all the words and tags in the dataset. It will also savedata_dir/vocab/dataset_params.json
with some extra information. -
All model parameters as well as training hyperparameters are specified in
config/word_embedding_blstm_config.json
, which looks like{ "embedding_dim": 50, "blstm_size": 512, "batch_size": 64, "lr": 1e-5, "num_epochs": 50 }
To experiment with different values for model parameters/training hyperparameters, this file will have to be modified.
-
Train the model
python word_embedding_blstm_train.py \ --config_file <path to config.json> \ --data_dir <Directory containing the processed dataset> \ --expereiment_dir <Directory where training artifacts will be saved> \ --resume_checkpoint_path <If specified, load specified checkpoint and resume training>
-
Evaluate the model on the heldout test set
python word_embedding_blstm_evaluate.py \ --config_file <path to config.json> \ --vocab_dir <Directory containing the vocab files> \ --test_data_dir <Directory containing the heldout test set> \ --model_checkpoint <Trained model checkpoint to use for eval>