Confusion of the Phoneme Generation

Question

Confusion of the Phoneme Generation

Closed this issue 7 years ago · 1 comments

In the paper, it says the phoneme transcription of the text is generated by CMU lexicon. However, in this code, it uses phonemizer, a toolkit uses US phoneset. There is a little difference in phoneme set and phoneme number between them. Besides, the paper also mentions that they added two phonemes for two pauses with different length, but I do not know where it is done in the code.

Thanks!

Answer 1 · 2018-03-19T08:30:36.000Z

Hi!

Thanks for the pointing the difference.
The silence phonemes are added during feature extraction phase (which uses merlin). They are marked as 'pau' and 'ssil'. Checkout the merlin code for more details - https://github.com/CSTR-Edinburgh/merlin.