Infinite loop?
Opened this issue · 2 comments
matthen commented
The below seems to hang forever-
segmenter = pysbd.Segmenter(language="en", clean=False)
text = "..[111 111 111 111 111 111 111 111 111 111]"
segmenter.segment(text)
Interrupting I get the traceback:
Traceback (most recent call last):
File "check.py", line 5, in <module>
segmenter.segment(text)
File ".../python3.7/site-packages/pysbd/segmenter.py", line 87, in segment
postprocessed_sents = self.processor(text).process()
File ".../python3.7/site-packages/pysbd/processor.py", line 37, in process
self.replace_periods_before_numeric_references()
File ".../python3.7/site-packages/pysbd/processor.py", line 141, in replace_periods_before_numeric_references
r"∯\2\r\7", self.text)
File ".../python3.7/re.py", line 192, in sub
return _compile(pattern, flags).sub(repl, string, count)
KeyboardInterrupt
this is pysbd version 0.3.3, python 3.7.7
Could it be entering into an infinite loop?
(I found this bug by applying pysbd to wikipedia, on this article: https://en.wikipedia.org/wiki/Clojure it tripped up on "...[484 216 622 139 651 592 379 228 242 355]"
nipunsadvilkar commented
It's due to Catastrophic backtracking in NUMBERED_REFERENCE_REGEX
. Need to dug into details
ajar19 commented
HI @nipunsadvilkar , We faced the same issue with another text.
text = ......[289852000000260698,289852000000260744
Any update on this, please?