Confusion of the Phoneme Generation
Closed this issue · 1 comments
darongliu commented
In the paper, it says the phoneme transcription of the text is generated by CMU lexicon. However, in this code, it uses phonemizer, a toolkit uses US phoneset. There is a little difference in phoneme set and phoneme number between them. Besides, the paper also mentions that they added two phonemes for two pauses with different length, but I do not know where it is done in the code.
Thanks!
adampolyak commented
Hi!
Thanks for the pointing the difference.
The silence phonemes are added during feature extraction phase (which uses merlin). They are marked as 'pau' and 'ssil'. Checkout the merlin code for more details - https://github.com/CSTR-Edinburgh/merlin.