bootphon/phonemizer

Chinese Mandarin, incoherent output and stack smashing

pasinit opened this issue · 4 comments

Describe the bug
I am trying to run espeak backend on Chinese Mandarin, however, I am getting different results when feeding the same input multiple times and eventually got ** stack smashing detected **.

Phonemizer version
3.0.1

System
Ubuntu 18.04

To reproduce

from phonemizer.punctuation import Punctuation
from phonemizer.backend import EspeakBackend
backend = EspeakBackend(
                'cmn',
                punctuation_marks=Punctuation.default_marks(),
                preserve_punctuation=False,
                with_stress=False,
                tie=False,
                language_switch='keep-flags',
                words_mismatch='ignore',
                )
backend.phonemize(['相'])
# ['ɕiɑŋji2() i2 ']
backend.phonemize(['相'])
# ['əəəəəə ']
backend.phonemize(['相'])
# *** stack smashing detected ***: <unknown> terminated
# Aborted

Expected behavior
All three call should return the same output.

Additional context
image

Hi,
I cannot reproduce your bug. I got a consistent output ɕiɑ5ŋ from . I'm using:

$ phonemize --version
phonemizer-3.2.0
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.0

I guess your are using an old version of espeak... Maybe try to upgrade to phonemizer-3.2 and espeak-ng-1.50 ?

Because Ubuntu version is 18.04 i am stuck with espeak-ng-1.49.2+dfsg-1

I upgraded phonemizer to 3.2.0 but now the outputs i get are the following:

backend.phonemize([''])
> ['ɕɡŋjits(kl) ɛ ɛ ']
backend.phonemize([''])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize([''])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize([''])
> ['ɕəɣjiʌ(kl) ɛ ɛ ']
backend.phonemize([''])
> ['ɕəɣjits(kl) ɛ ɛ ɛ ']
backend.phonemize([''])
> ['ɕəɣjits(kl) ɛ ɛ ']
backend.phonemize([''])
> ['ɕəɣjits(kl) ɛ ɛ ']

which does not look right and also varies across different runs...

this is the output for phonemize --version

phonemize --version
phonemizer-3.2.0
available backends: espeak-ng-1.49.2, festival-2.5.0, segments-2.2.0
uninstalled backends: espeak-mbrola

To use espeak-1.50 you can either build it from sources or use the phonemizer Docker image

A quick update, with espeak-ng-1.50 I can reproduce @mmmaat output. Note that if I compile from the most updated sources (1.52) the output is instead different and looks wrong (but still consistent at least).
As future reference, I installed espeak from this link
Thanks for your help @mmmaat