/britfoner

model British English pronunciation with Keras+Tensorflow

Primary LanguagePythonGNU General Public License v2.0GPL-2.0

Forked from https://github.com/JoseLlarena/britfoner. Added modifications for newer keras compatibility (>2.2.2) by alexdiment.

britfoner


automated pronunciation for British English with Keras+Tensorflow


CircleCI

Britfoner is an api for translating English words to their phonetic form (in British English). It uses phonetic dictionary Britfone as a first lookup and a Keras+Tensorflow grapheme-to-phoneme converter (trained also on Britfone) as a backup.

Britfoner incorporates code from seq2seq and recurrentshop

Britfoner is limited to words 18 characters and less.

Further details

Install

Windows

Download and install Miniconda

Then create a virtual environment

mkdir gp2
cd gp2
conda create -n g2p python=3.6

activate it:

activate g2p

install britfoner plus dependencies:

(g2p) conda install h5py scikit-learn 
(g2p) pip install git+https://github.com/JoseLlarena/britfoner.git

Linux

Download and install python 3.6

create a virtual environment

jose@jose-dev:~/projects$ mkdir g2p && cd gp2 && python3 -m venv env
jose@jose-dev:~/projects/g2p$ source activate env
(env) jose@jose-dev:~/projects/g2p$ 

activate it:

jose@jose-dev:~/projects/g2p$ source activate env
(env) jose@jose-dev:~/projects/g2p$ 

install britfoner with dependencies:

(env) jose@jose-dev:~/projects/g2p$ pip install git+https://github.com/JoseLlarena/britfoner.git

Usage

(env) jose@jose-dev:~/projects/g2p$ python -c "import britfoner.api as api; print(api.pronounce('success'))"
Using TensorFlow backend.
{('s', 'ə', 'k', 's', 'ɛ', 's')}

Full API documentation

Background

Britfoner maps English words to their pronunciations as per the International Phonetic Alphabet. It looks up a word (matching [A-Za-z ']+) in Britfone, a British English pronunciation dictionary, returning the possible pronunciations. If the word is not found, it uses a Keras Deep Learning (with a Tensorflow backend) model as a backup.

The model is a Sequence to Sequence model with Attention, with a single-layered 256-hidden-unit bidirectional-encoder and decoder. It was trained on 16,042 unaligned word-pronunciation pairs, using 163 pairs for validation (99%/1% split), for 220 epochs (1 1/2 hours on a i5+GTX1050 GPU), with the Adam optimiser, learning rate 10-2, decay 10-5 and 10% dropout. The output is unnormalised, the loss is mean squared error and the output is decoded with a greedy strategy. The final word error rate was 15.95%.

Changelog

see Changelog

License

GPL2 @ Jose Llarena