/text-frontend-tts

Multilingual text processing API for cleaning, IPA word phonemization, tokenization, translating into sequence of character IDs for easy stacking with neural Text-to-Speech models.

Primary LanguagePythonMIT LicenseMIT

text-frontend-tts

Multilingual text processing API for cleaning, IPA phonemization, tokenization, translating into sequence of character IDs for easy stacking with neural Text-to-Speech models.

1 Installation

Supported OS type: Unix (only)

Package provides simple installation:

  • Clone the repo git clone https://github.com/ivanvovk/text-frontend-tts.git
  • Get into the root cd text-frontend-tts
  • Run sh install.sh. The script will:
    • Install all necessary Python dependencies
    • Initialize phonemizer submodule
    • Download and install G2P backends: espeak-ng, festival, mbrola, which are necessary to make phonemizer work
    • Install phonemizer as Python package
    • Install text_frontend as Python package

2 Usage

API is devoted for neural TTS systems text inputs preprocessing (i.e. getting the sequence of character embedding ids). Package supports grapheme and phoneme text representation. (Note: grapheme processing doesn't support word stressing, whereas phoneme processing does)

Code examples:

Import:

from text_frontend import TextFrontend

Initialization:

# Encodes grapheme inputs
tf = TextFrontend(text_cleaners=['basic_cleaners'], use_phonemes=True, n_jobs=1, with_stress=False)

To get the number of supported characters to know how many embeddings to initialize in your TTS neural network (note: current API supports only IPA phoneme scheme):

tf = TextFrontend(use_phonemes=False)  # if using graphemes for encoding
print(tf.nchars)
# Output: 119

tf = TextFrontend(use_phonemes=True)  # if using phonemes for encoding
print(tf.nchars)
# Output: 236

Text encoding:

# Encodes grapheme inputs
tf = TextFrontend(text_cleaners=['english_cleaners'], use_phonemes=False)

text = "Mr. User, this is test sentence to check the performance of phonemizer and text-to-sequence encoding."
print(tf.graphemes_to_phonemes(text, lang='en-us'))  # it still can make G2P
# Output: "m_ˈɪ_s_t_ɚ_._ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_."

sequence = tf.text_to_sequence(text, lang='en-us')
print(sequence)
# Output: [36, 32, 42, 43, 28, 41, 2, 44, 42, 28, 41, 5, 2, 43, 31, 32, 42, 2, 32, 42, 2, 43, 28, 42, 43, 2, 42, 28, 37, 43, 28, 37, 26, 28, 2, 43, 38, 2, 26, 31, 28, 26, 34, 2, 43, 31, 28, 2, 39, 28, 41, 29, 38, 41, 36, 24, 37, 26, 28, 2, 38, 29, 2, 39, 31, 38, 37, 28, 36, 32, 49, 28, 41, 2, 24, 37, 27, 2, 43, 28, 47, 43, 6, 43, 38, 6, 42, 28, 40, 44, 28, 37, 26, 28, 2, 28, 37, 26, 38, 27, 32, 37, 30, 7, 1]

print(tf.sequence_to_text(sequence))  # however encoding corresponds only to grapheme representation
# Output: "mister user, this is test sentence to check the performance of phonemizer and text-to-sequence encoding."
# Encodes phoneme inputs
tf = TextFrontend(text_cleaners=['english_cleaners'], use_phonemes=True, with_stress=True)

text = "Mr. User, this is test sentence to check the performance of phonemizer and text-to-sequence encoding."
print(tf.graphemes_to_phonemes(text, lang='en-us'))
# Output: "m_ˈɪ_s_t_ɚ_._ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_."

sequence = tf.text_to_sequence(text, lang='en-us')
print(sequence)
# Output: [153, 45, 42, 225, 89, 135, 127, 122, 137, 89, 5, 135, 76, 159, 42, 135, 159, 137, 135, 225, 87, 42, 225, 135, 42, 87, 165, 225, 77, 165, 42, 135, 225, 77, 135, 55, 87, 160, 135, 76, 77, 135, 147, 89, 38, 83, 153, 77, 165, 42, 135, 104, 139, 135, 38, 123, 165, 153, 217, 137, 89, 135, 133, 165, 151, 135, 225, 87, 160, 42, 225, 6, 135, 225, 77, 6, 135, 42, 141, 160, 35, 77, 165, 42, 135, 158, 40, 160, 123, 151, 159, 40, 7, 1]

print(tf.sequence_to_text(sequence))  # encoding corresponds to phoneme representation
# Output: "m_ˈɪ_s_t_ɚ_ _j_ˈuː_z_ɚ_,_ _ð_ɪ_s_ _ɪ_z_ _t_ˈɛ_s_t_ _s_ˈɛ_n_t_ə_n_s_ _t_ə_ _tʃ_ˈɛ_k_ _ð_ə_ _p_ɚ_f_ˈoːɹ_m_ə_n_s_ _ʌ_v_ _f_ˈoʊ_n_m_aɪ_z_ɚ_ _æ_n_d_ _t_ˈɛ_k_s_t_-_ _t_ə_-_ _s_ˈiː_k_w_ə_n_s_ _ɛ_ŋ_k_ˈoʊ_d_ɪ_ŋ_."

Just cleaning the text:

from text_frontend import clean_text

text = "Mr. User, this is test sentence   to check the performance of text cleaning. It costs $0."
print(clean_text(text, ['english_cleaners']))
# Output: "mister user, this is test sentence to check the performance of text cleaning. it costs zero dollars."

For more details read the docs when calling functions.