No delimiter in predictions from multi-letter phoneme codes

Question

No delimiter in predictions from multi-letter phoneme codes

thelahunginjeet opened this issue 2 years ago · 1 comments

Thank you for putting this out there. I'm trying to train the model myself on English CMU pronunciations, which have multi-letter phoneme codes. I structure my phoneme transcriptions as lists, for example:

('en_us', 'timbre', ['T','IH1','M','B','ER0'])

The model trains fine, but when I ask for transcriptions (via, say phonemise_list()), the model output doesn't put delimiters between the phonemes; so it's version of 'timbre' is:

'TAY1MBER0'

This is not helpful, and also not what the pre-trained CMU model does - It produces output like:

'[T][AY1][M][B][ER0]'

How can I adjust the config file or the calls to train() so that I get back something with delimiters between the phonemes?

Answer 1 · 2022-10-14T18:40:14.000Z

I figured this out, in case anyone else is having the issue. The trick is to include delimiters in both your phoneme inventory and the transcriptions. So, in the config file, instead of:

phoneme_symbols: ['T', 'UW1', 'S', . . .]

you want

phoneme_symbols:['[T]', '[UW1]', '[S]', . . .]

And the training/test samples look like:

('en_us', 'timbre', ['[T]', '[IH1]', '[M]', '[B]', '[ER0]'])

If you do that, you'll get predictions that look like what the pre-trained CMU model produces.