it doesn't support phoneme segment for Japanese?

Question

it doesn't support phoneme segment for Japanese?

yw0nam opened this issue 2 years ago · 8 comments

Hi, Thanks for great work and sharing your research.

I'm really exciting for applying your work for my TTS model.
In your paper, the proposed model support phoneme segment for distinguish other phoneme belonging to different word tokens.

But, When I run the code for Japanese, the output phoneme doesn't have phoneme segment

from transformers import AutoModel, AutoTokenizer
from text2phonemesequence import Text2PhonemeSequence

# Load XPhoneBERT model and its tokenizer
xphonebert = AutoModel.from_pretrained("vinai/xphonebert-base")
tokenizer = AutoTokenizer.from_pretrained("vinai/xphonebert-base")

# "これは、テストのだめのテキストです" means ->  This is texts for testing.
text2phone_model = Text2PhonemeSequence(language='jpn')
text2phone_model.infer_sentence("これは、テストのだめのテキストです")
# Output: 'k o ɾ e h a t e s ɯ t o n o d a m e n o t e k i s ɯ t o d e s ɯ'

is the result caused by wrong execution? or the model doesn't support phoneme segment?

Answer 1 · 2023-06-15T04:27:36.000Z

It seems to me that your input Japanese text is not word-segmented, right? @yw0nam
As mentioned in our Readme's Notes, you would have to perform Japanese word segmentation first, before feeding the text into Text2PhonemeSequence.

Answer 2 · 2023-06-15T05:02:55.000Z

@datquocnguyen Thanks for replying.
Aha, i see. I have to do word segment using "space".

text2phone_model.infer_sentence("これは, テストの だめの テキスト です.")
# Output: 'k o ɾ e h a ▁ t e s ɯ t o n o ▁ d a m e n o ▁ t e k i s ɯ t o ▁ d e s ɯ'

And, as you can see above, the punctuations("," and ".") are removed in output.
Is there any method to preserve them?

Answer 3 · 2023-06-15T05:27:11.000Z

When we built the dataset for pretraining, we used spacy for word segmentation. And it helped to separate the punctuation and word, and we considered punctuation like a word and preserve it. You can consider using spacy or another tool that helps you separate the punctuation and word to solve this problem.

Answer 4 · 2023-06-15T05:31:03.000Z

The input "tokenized" text should be: これは , テストのだめのテキストです .

Answer 5 · 2023-07-04T06:46:37.000Z

I use spacy get this result: ['これ', 'は', '、', 'テスト', 'の', 'だめ', 'の', 'テキスト', 'です', '.']

Answer 6 · 2023-07-07T03:01:47.000Z

Yes, after that you need to join words by space, and then use our text2phonemesequence to convert the sentence to a phoneme sequence

Answer 7 · 2023-07-07T03:33:33.000Z

text2phone_model = Text2PhonemeSequence(language='jpn', is_cuda=True)
text2phone_model.infer_sentence(" ".join(['これ', 'は', '、', 'テスト', 'の', 'だめ', 'の', 'テキスト', 'です', '.']))
@linxiaolong01

Answer 8 · 2023-07-14T08:42:59.000Z

Hi @datquocnguyen, Thanks for your great work.

Is it impossible to segment the IPA phonemes into syllable-level?

such as,

text2phone_model.infer_sentence(" ".join(['これ', 'は', '、', 'テスト', 'の', 'だめ', 'の', 'テキスト', 'です', '.']))
---
now output: 'k o ɾ e ▁ h a ▁ ɕ i ▁ t e s ɯ t o ▁ n o ▁ d a m e ▁ n o ▁ t e k i s ɯ t o ▁ d e s ɯ ▁ .'
deisred output: 'k o_ɾ e ▁ h a ▁ ɕ i ▁ t e_s ɯ_t o ▁ n o ▁ d a_m e ▁ n o ▁ t e_k i_s ɯ_t o ▁ d e_s ɯ ▁ .'