Bad results when crossing japanese and chinese
Closed this issue · 1 comments
amorin-gladia commented
Describe the bug
When phonemizing japanese with chinese language, or japanese with chinese language, "dʒapəniːz le̞tə" or "tʃaɪniːz le̞tə" results comes.
Phonemizer version
phonemizer-3.3.0
available backends: espeak-ng-1.50, segments-2.2.1
uninstalled backends: espeak-mbrola, festival
System
MacOS
Python 3.11.6
To reproduce
word = "宅地域"
print(word, phonemize(word, "ja"))
word = "えっと"
print(word, phonemize(word, "cmn"))
# output:
# 電波妨 (en)tʃaɪniːz(ja)le̞tə (en)tʃaɪniːz(ja)le̞tə (en)tʃaɪniːz(ja)le̞tə
# えっと (en)dʒapəniːz(cmn)əː1 (en)dʒapəniːz(cmn)əː1 (en)dʒapəniːz(cmn)əː1
Expected behavior
word = "電波妨"
print(word, phonemize(word, "ja"))
word = "えっと"
print(word, phonemize(word, "cmn"))
# output:
# 電波妨 (cmn)daɪanfɔː bəwɒn faŋtuː(ja)
# えっと (ja)e̞tto̞(cmn)
Additional context
I did't not try other cross languages possibilities, but the bug could be deeper.
mmmaat commented
Hi, this is an issue with espeak-ng, not phonemizer. For instance espeak-ng -v cmn --ipa -x "えっと"
outputs
(en)dʒˈa5pə5niː5z(cmn)əː2 (en)dʒa1pə1niː1z(cmn)əː1 (en)dʒa1pə1niː1z(cmn)əː1
.