it doesn't support phoneme segment for Japanese?
yw0nam opened this issue · 8 comments
Hi, Thanks for great work and sharing your research.
I'm really exciting for applying your work for my TTS model.
In your paper, the proposed model support phoneme segment for distinguish other phoneme belonging to different word tokens.
But, When I run the code for Japanese, the output phoneme doesn't have phoneme segment
from transformers import AutoModel, AutoTokenizer
from text2phonemesequence import Text2PhonemeSequence
# Load XPhoneBERT model and its tokenizer
xphonebert = AutoModel.from_pretrained("vinai/xphonebert-base")
tokenizer = AutoTokenizer.from_pretrained("vinai/xphonebert-base")
# "これは、テストのだめのテキストです" means -> This is texts for testing.
text2phone_model = Text2PhonemeSequence(language='jpn')
text2phone_model.infer_sentence("これは、テストのだめのテキストです")
# Output: 'k o ɾ e h a t e s ɯ t o n o d a m e n o t e k i s ɯ t o d e s ɯ'
is the result caused by wrong execution? or the model doesn't support phoneme segment?
It seems to me that your input Japanese text is not word-segmented, right? @yw0nam
As mentioned in our Readme's Notes, you would have to perform Japanese word segmentation first, before feeding the text into Text2PhonemeSequence.
@datquocnguyen Thanks for replying.
Aha, i see. I have to do word segment using "space".
text2phone_model.infer_sentence("これは, テストの だめの テキスト です.")
# Output: 'k o ɾ e h a ▁ t e s ɯ t o n o ▁ d a m e n o ▁ t e k i s ɯ t o ▁ d e s ɯ'
And, as you can see above, the punctuations("," and ".") are removed in output.
Is there any method to preserve them?
When we built the dataset for pretraining, we used spacy for word segmentation. And it helped to separate the punctuation and word, and we considered punctuation like a word and preserve it. You can consider using spacy or another tool that helps you separate the punctuation and word to solve this problem.
The input "tokenized" text should be: これは , テストの だめの テキスト です .
I use spacy get this result: ['これ', 'は', '、', 'テスト', 'の', 'だめ', 'の', 'テキスト', 'です', '.']
Yes, after that you need to join words by space, and then use our text2phonemesequence to convert the sentence to a phoneme sequence
text2phone_model = Text2PhonemeSequence(language='jpn', is_cuda=True)
text2phone_model.infer_sentence(" ".join(['これ', 'は', '、', 'テスト', 'の', 'だめ', 'の', 'テキスト', 'です', '.']))
@linxiaolong01
Hi @datquocnguyen, Thanks for your great work.
Is it impossible to segment the IPA phonemes into syllable-level?
such as,
text2phone_model.infer_sentence(" ".join(['これ', 'は', '、', 'テスト', 'の', 'だめ', 'の', 'テキスト', 'です', '.']))
---
now output: 'k o ɾ e ▁ h a ▁ ɕ i ▁ t e s ɯ t o ▁ n o ▁ d a m e ▁ n o ▁ t e k i s ɯ t o ▁ d e s ɯ ▁ .'
deisred output: 'k o_ɾ e ▁ h a ▁ ɕ i ▁ t e_s ɯ_t o ▁ n o ▁ d a_m e ▁ n o ▁ t e_k i_s ɯ_t o ▁ d e_s ɯ ▁ .'