Describe the bug
When an open parenthesis appears in certain situations in German text, it can cause a crash when running sentence splitting.
To Reproduce
from pysbd import Segmenter
text = 'auf der Suche nach Einsätzen als Skilehrer im DACH-Raum. Langjährige Erfahrung im Leiten von Gruppen diverser Altersgruppen und Sportarten. B.A. Sport und Gesundheit in Prävention und Therapie (Deutsche Spothochschule Köln) Zertifikate: Erste Hilfe, DRK Rettungsschwimmer silber, DSHS Fitnesstrainer B(asic) Lizenz, Aquafitness Instructor, Progressive Muskelentspannung'
de_split = Segmenter(language='de')
de_split.segment(text)
This crashes at
File "/home/erik/.local/lib/python3.8/site-packages/pysbd/lang/deutsch.py", line 74, in scan_for_replacements
txt = re.sub(r'(?<={am}).(?=\s)'.format(am=am), '∯', txt)
Expected behavior
Segments text
Additional context
Crash due to sequence: B(a
Suggested fix: Add
to deutsch.py in scan_for_replacement
Traceback (most recent call last):
File "german_fix.py", line 8, in
de_split.segment(text)
File "/home/erik/.local/lib/python3.8/site-packages/pysbd/segmenter.py", line 87, in segment
postprocessed_sents = self.processor(text).process()
File "/home/erik/.local/lib/python3.8/site-packages/pysbd/processor.py", line 34, in process
self.replace_abbreviations()
File "/home/erik/.local/lib/python3.8/site-packages/pysbd/processor.py", line 180, in replace_abbreviations
self.text = self.abbreviations_replacer().replace()
File "/home/erik/.local/lib/python3.8/site-packages/pysbd/lang/deutsch.py", line 66, in replace
self.text = self.search_for_abbreviations_in_string(self.text)
File "/home/erik/.local/lib/python3.8/site-packages/pysbd/abbreviation_replacer.py", line 92, in search_for_abbreviations_in_string
text = self.scan_for_replacements(
File "/home/erik/.local/lib/python3.8/site-packages/pysbd/lang/deutsch.py", line 74, in scan_for_replacements
txt = re.sub(r'(?<={am})\.(?=\s)'.format(am=am), '∯', txt)
File "/usr/lib/python3.8/re.py", line 210, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/lib/python3.8/re.py", line 304, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
itemsappend(_parse(source, state, verbose, nested + 1,
File "/usr/lib/python3.8/sre_parse.py", line 759, in _parse
raise source.error("missing ), unterminated subpattern",
re.error: missing ), unterminated subpattern at position 0