bootphon/phonemizer

--preserve-punctuation doesn't preserve parentheses

Closed this issue · 3 comments

Describe the bug
Comma and some punctuations are well-preserved, but not parentheses, brackets, and braces.

Phonemizer version
The output of phonemize --version from command line, very helpfull!

phonemizer-3.0.1
available backends: espeak-ng-1.50, espeak-mbrola, segments-2.2.0
uninstalled backends: festival

System
Ubuntu 21.10
Python 3.9.7

To reproduce

echo "I would like a (big) steack, in a [large] hamburger {yes}!" | phonemize -l en-gb --preserve-punctuation

aɪ wʊd laɪk ɐ bɪɡ stiːk, ɪn ɐ lɑːdʒ hambɜːɡə jɛs!

Expected behavior
parentheses, brackets, braces, and other kinds of punctuations should be provided in the output.

@mmmaat & @jncasey do you think that we should add these punctuation ((){}) marks to the _DEFAULT_MARKS constant?

Before the feature to define punctuation with regex, I was typically using ;:,.!?¡¿—…"“”-()‘’*[], so It'd make sense to me to add (){}[] to the defaults.

Sure I agree, why not!