Chinese Mandarin, incoherent output and stack smashing
pasinit opened this issue · 4 comments
Describe the bug
I am trying to run espeak backend on Chinese Mandarin, however, I am getting different results when feeding the same input multiple times and eventually got ** stack smashing detected **.
Phonemizer version
3.0.1
System
Ubuntu 18.04
To reproduce
from phonemizer.punctuation import Punctuation
from phonemizer.backend import EspeakBackend
backend = EspeakBackend(
'cmn',
punctuation_marks=Punctuation.default_marks(),
preserve_punctuation=False,
with_stress=False,
tie=False,
language_switch='keep-flags',
words_mismatch='ignore',
)
backend.phonemize(['相'])
# ['ɕiɑŋji2() i2 ']
backend.phonemize(['相'])
# ['əəəəəə ']
backend.phonemize(['相'])
# *** stack smashing detected ***: <unknown> terminated
# Aborted
Expected behavior
All three call should return the same output.
Hi,
I cannot reproduce your bug. I got a consistent output ɕiɑ5ŋ
from 相
. I'm using:
$ phonemize --version
phonemizer-3.2.0
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.0
I guess your are using an old version of espeak... Maybe try to upgrade to phonemizer-3.2
and espeak-ng-1.50
?
Because Ubuntu version is 18.04 i am stuck with espeak-ng-1.49.2+dfsg-1
I upgraded phonemizer to 3.2.0 but now the outputs i get are the following:
backend.phonemize(['相'])
> ['ɕɡŋjits(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjits(kl) ɛ ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjits(kl) ɛ ɛ ']
backend.phonemize(['相'])
> ['ɕəɣjits(kl) ɛ ɛ ']
which does not look right and also varies across different runs...
this is the output for phonemize --version
phonemize --version
phonemizer-3.2.0
available backends: espeak-ng-1.49.2, festival-2.5.0, segments-2.2.0
uninstalled backends: espeak-mbrola
To use espeak-1.50 you can either build it from sources or use the phonemizer Docker image