Phrase break prediction for Text-to-Speech systems

This repository contains code to train phrasing models for Text-to-Speech systems. The models are trained using LibriTTS alignments avialable kan-bayashi/LibriTTSLabel. The train-clean-360 split is used for training, while the dev-clean and test-clean splits are used for validation and test respectively.

Quick start

Download and preprocess the dataset

Download the dataset kan-bayashi/LibriTTSLabel

Preprocess the downloaded LibriTTS Label dataset and transform to a format suitable for the model

python utils/build_LibriTTS_label_dataset.py \
    --raw_dataset_dir <Path to the downloaded dataset> \
    --processed_dataset_dir <Output dir, where the processed dataset will be written>

Train Word Embedding + BLSTM model

Build vocabularies of words and tags from the processed dataset; for training word emebeddings from scratch
```
python utils/build_vocab_word_embeddings.py \
    --data_dir <Directory containing the processed dataset>
```
Running this script will save vocabulary files data_dir/vocab/words.txt and data_dir/vocab/tags.txt containing all the words and tags in the dataset. It will also save data_dir/vocab/dataset_params.json with some extra information.
All model parameters as well as training hyperparameters are specified in config/word_embedding_blstm_config.json, which looks like
```
{
    "embedding_dim": 50,
    "blstm_size": 512,
    "batch_size": 64,
    "lr": 1e-5,
    "num_epochs": 50
}
```
To experiment with different values for model parameters/training hyperparameters, this file will have to be modified.

Train the model

python word_embedding_blstm_train.py \
    --config_file <path to config.json> \
    --data_dir <Directory containing the processed dataset> \
    --expereiment_dir <Directory where training artifacts will be saved> \
    --resume_checkpoint_path <If specified, load specified checkpoint and resume training>

Evaluate the model on the heldout test set

python word_embedding_blstm_evaluate.py \
    --config_file <path to config.json> \
    --vocab_dir <Directory containing the vocab files> \
    --test_data_dir <Directory containing the heldout test set> \
    --model_checkpoint <Trained model checkpoint to use for eval>

merumeru-rururu/phrase_break_prediction

Phrase break prediction for Text-to-Speech systems

Quick start

Download and preprocess the dataset

Train Word Embedding + BLSTM model

References