Performance improvement?
dakinggg opened this issue · 7 comments
I am not certain of this, but I suspect there might be room for performance improvement by using re.compile
to precompile all of the needed regexs. Otherwise they will have to be compiled regularly (once the re
cache of 100 has been exceeded)
I don't think there will be any perceivable difference unless some of the regexes are in the loop.
Quote from the Python 3 docs for your perusal:
Should you use these module-level functions, or should you get the pattern and call its methods yourself? If you’re accessing a regex within a loop, pre-compiling it will save a few function calls. Outside of loops, there’s not much difference thanks to the internal cache.
I think that if there are more than 100 regexs (including any that are in loops like here:
pySBD/pysbd/abbreviation_replacer.py
Line 62 in a2bb451
Here is another regex in a loop:
pySBD/pysbd/lists_item_replacer.py
Line 115 in a2bb451
Yes, I agree there will be such a regex within a loop. In that case, would you mind tweaking it with precompiled ones and assess the performance? I would love to assist with that.
Not sure when I will have time to do it, but I can try at some point
Cool, same here! will let you know if I happen to do this performance exercise.
Fixed #71