
Persian Language Support and Punctuation Preservation

Closed this issue · 2 comments

1: Farsi Language Support

Currently, the espeak backend only supports the "fa" locale for the Farsi language. When using "fa-latn," an error is encountered, stating that the language "fa-latn" is not supported.

2: Punctuation Preservation

In the conversion to phoneme, the punctuation comma specific to the Persian language (،) is not preserved as expected.


In my case both works:

$ echo 'این یک امتحان است.' | phonemize -b espeak -l fa-latn
iːn jek emtehɑn ast 
$ echo 'این یک امتحان است.' | phonemize -b espeak -l fa
iːn jek emtehɑn ast
$ phonemize --version
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.2.1

In your case it may be an issue with your espeak installation.

  1. This may be a bug, please provide an exemple. In any case you can specify which characters are punctuation in the following option
    (so you can ignore the comma)

Thank you very much for your help!

To address Problem 1, I've updated the espeak-ng version.

For Problem 2, I've successfully preserved the '،' when converting to phonemes using the 'preserve_punctuation' parameter. Here's an example:

text = 'تست این ماژول، برای انجام تبدیل متن به فنوم، نتیجه قابل مشاهده است.'
farsi_phonemizer = phonemizer.backend.EspeakBackend(language='fa-latn', preserve_punctuation=True, punctuation_marks='،')

This configuration has resolved the issue, and I appreciate your help with it.