bootphon/phonemizer

Bad results when crossing japanese and chinese

Closed this issue · 1 comments

Describe the bug
When phonemizing japanese with chinese language, or japanese with chinese language, "dʒapəniːz le̞tə" or "tʃaɪniːz le̞tə" results comes.

Phonemizer version
phonemizer-3.3.0
available backends: espeak-ng-1.50, segments-2.2.1
uninstalled backends: espeak-mbrola, festival

System
MacOS
Python 3.11.6

To reproduce

word = "宅地域"
print(word, phonemize(word, "ja"))
word = "えっと"
print(word, phonemize(word, "cmn"))
# output:
# 電波妨 (en)tʃaɪniːz(ja)le̞tə (en)tʃaɪniːz(ja)le̞tə (en)tʃaɪniːz(ja)le̞tə 
# えっと (en)dʒapəniːz(cmn)əː1 (en)dʒapəniːz(cmn)əː1 (en)dʒapəniːz(cmn)əː1

Expected behavior

word = "電波妨"
print(word, phonemize(word, "ja"))
word = "えっと"
print(word, phonemize(word, "cmn"))
# output:
# 電波妨 (cmn)daɪanfɔː bəwɒn faŋtuː(ja)
# えっと (ja)e̞tto̞(cmn)

Additional context
I did't not try other cross languages possibilities, but the bug could be deeper.

Hi, this is an issue with espeak-ng, not phonemizer. For instance espeak-ng -v cmn --ipa -x "えっと" outputs
(en)dʒˈa5pə5niː5z(cmn)əː2 (en)dʒa1pə1niː1z(cmn)əː1 (en)dʒa1pə1niː1z(cmn)əː1.