/g2p-old

g2p: Thai Grapheme To Phoneme Conversion

Primary LanguagePythonApache License 2.0Apache-2.0

g2p_th: A Simple Python Module for Thai Grapheme To Phoneme Conversion

Fork from https://github.com/Kyubyong/g2p deep learning seq2seq framework based on TensorFlow.

Notebook : https://colab.research.google.com/drive/1eVFh3336NEDimQPGcrhOul-fmA22UojM

Environment

  • python 2.x or 3.x

Dependencies

  • numpy >= 1.13.1
  • tensorflow >= 1.3.0
  • nltk >= 3.2.4
  • python -m nltk.downloader "averaged_perceptron_tagger" "cmudict"
  • inflect >= 0.3.1
  • Distance >= 0.1.3
  • future >= 0.16.0
  • PyThaiNLP

Installation

python setup.py install

nltk package will be automatically downloaded at your first run.

Training (Note that pretrained model is already included)

python train.py

Usage

from g2p_th import g2p
text = "เราเดินเล่น"
print(g2p(text))
>>>['r', 'a', 'w^', '0', ' ', 'r', 'ee', 'z^', '0', ' ', 'r', 'ee', 'z^', '0']

If you need to convert lots of texts, you can use the global tf session.

import g2p_th as g2p

with g2p.Session():
    phs = [g2p.g2p(text) for text in texts]