A sequence2sequence model that can translate lower case english words with no punctuation into their arpabet phonetical representations. Most of the code related to the seq2seq attention model itself is due thanks to Sean Robertson's seq2seq+attention PyTorch tutorial.
The code that builds the training corpus is my own code. It takes NLTK's version of the CMU Pronunciation Dictionary and builds a training corpus of over 133,000 training instances. I also added a Levenshtein similarity metric that can be used to calculate model accuracy.
To download the model:
git clone https://github.com/epeters3/phonemicizer.git
cd phonemicizer
pip3 install -r requirements.txt
To use the current best model saved in the repo to predict phonetic representations, there is a CLI and Python API.
The module can accept text via stdin, phonemicize it, then output to stdout, e.g.:
$ echo "Interesting." | python3 -m phonemicizer
N T ER AH S T IH NG
from phonemicizer import TrainedPhonemicizer
model = TrainedPhonemicizer()
model.predict("Interesting.")
# N T ER AH S T IH NG
To assess model performance by evaluating the trained model on some samples:
python3 -m phonemicizer.evaluate [--n <number_of_samples>]
The repo includes state dictionaries for the model so it can go straight into predict and evaluate mode, but to train the model from scratch:
python3 -m phonemicizer.train \
--n_iters <number_of_training_iterations> \
[--epoch_length 1000] \
[--plot_every 100] \
[--learning_rate 0.0001] \
[--weight_decay 0.0] \
[--teacher_forcing_ratio 0.5] \
[--dropout_p 0.1] \
[--lr_patience 5] \
The model uses a single GRU layer in the encoder and an attention and GRU layer in the decoder, with a hidden size of 256 for all layers. Accuracy for the trained model weights currently in the repo is about 87% on the hold-out test set. The current model was trained for 120,000 iterations.