Forked from https://github.com/JoseLlarena/britfoner. Added modifications for newer keras compatibility (>2.2.2) by alexdiment.
automated pronunciation for British English with Keras+Tensorflow
Britfoner is an api for translating English words to their phonetic form (in British English). It uses phonetic dictionary Britfone as a first lookup and a Keras+Tensorflow grapheme-to-phoneme converter (trained also on Britfone) as a backup.
Britfoner incorporates code from seq2seq and recurrentshop
Britfoner is limited to words 18 characters and less.
Install
Windows
Download and install Miniconda
Then create a virtual environment
mkdir gp2
cd gp2
conda create -n g2p python=3.6
activate it:
activate g2p
install britfoner plus dependencies:
(g2p) conda install h5py scikit-learn
(g2p) pip install git+https://github.com/JoseLlarena/britfoner.git
Linux
Download and install python 3.6
create a virtual environment
jose@jose-dev:~/projects$ mkdir g2p && cd gp2 && python3 -m venv env
jose@jose-dev:~/projects/g2p$ source activate env
(env) jose@jose-dev:~/projects/g2p$
activate it:
jose@jose-dev:~/projects/g2p$ source activate env
(env) jose@jose-dev:~/projects/g2p$
install britfoner with dependencies:
(env) jose@jose-dev:~/projects/g2p$ pip install git+https://github.com/JoseLlarena/britfoner.git
Usage
(env) jose@jose-dev:~/projects/g2p$ python -c "import britfoner.api as api; print(api.pronounce('success'))"
Using TensorFlow backend.
{('s', 'ə', 'k', 's', 'ɛ', 's')}
Background
Britfoner maps English words to their pronunciations as per the International Phonetic Alphabet.
It looks up a word (matching [A-Za-z ']+
) in Britfone, a British English
pronunciation dictionary, returning the possible pronunciations. If the word is not found, it uses a Keras Deep Learning
(with a Tensorflow backend) model as a backup.
The model is a Sequence to Sequence model with Attention, with a single-layered 256-hidden-unit bidirectional-encoder and decoder. It was trained on 16,042 unaligned word-pronunciation pairs, using 163 pairs for validation (99%/1% split), for 220 epochs (1 1/2 hours on a i5+GTX1050 GPU), with the Adam optimiser, learning rate 10-2, decay 10-5 and 10% dropout. The output is unnormalised, the loss is mean squared error and the output is decoded with a greedy strategy. The final word error rate was 15.95%.
Changelog
see Changelog
License
GPL2 @ Jose Llarena