bootphon/phonemizer

Maximum recursion depth exceeded with --preserve-punctuation and long documents

jncasey opened this issue · 2 comments

Describe the bug
If you try to phonemize a long document (say, a Project Gutenberg ebook) and preserve punctuation, phonemize throws
fatal error: maximum recursion depth exceeded in comparison

Phonemizer version
phonemizer-3.0.1
available backends: espeak-1.48.3, segments-2.2.0
uninstalled backends: espeak-mbrola, festival

System
Both macOS 11.6 and Ubuntu 20.10

To reproduce
phonemize --preserve-punctuation pg67147.txt
where pg67147.txt is this ebook, or anything relatively long.

Expected behavior
This looks like a result of the Punctuation restore methods being recursive and my (probably unreasonably long) use case.

Additional context
I'm happy to try refactoring the restore methods into a single iterative method. Just let me know if you'd like me to contribute.

Please open PR, I've got the same issue and your refactoring seems to fix it.

My previous PR for this project (#103, to add a simple new feature) is still pending, and I did my refactoring of the punctuation restore method on a branch off of that work. My git abilities are kind of rusty, so I'm worried that I'd mess something up if I tried to make a new PR independent from my other one. So hopefully the maintainers will review my first PR soon and then I can proceed with my fix for this.