bootphon/phonemizer

Two phones in Arabic are mapped to the same symbol

Closed this issue · 1 comments

I noticed that "ظ" (ðˤ) and "ذ" (ð) are both mapped to "ð"

from phonemizer.backend import BACKENDS
g2p = BACKENDS['espeak'](language='ar', words_mismatch='warn', preserve_punctuation=False)
g2p.phonemize(["ظ"], separator=Separator(phone=None, word=' ', syllable='|'), strip=True)

# -----> ðaaʔ

g2p.phonemize(["ذ"], separator=Separator(phone=None, word=' ', syllable='|'), strip=True)

# -----> ðaal

packages:
phonemizer 3.2.1
python 3.8.19

Hi, this is related to the espeak backend, not phonemizer. Nothing we can do here. You may want to open an issue in the espeak-ng repo.

$ espeak-ng -q -x --ipa -v ar "ظ"
ðˈaaʔ
$ espeak-ng -q -x --ipa -v ar "ذ"
ðˈaal