Regex concatenation performance concern
Bilge opened this issue · 3 comments
Crawler-Detect/src/CrawlerDetect.php
Lines 99 to 102 in 02b24e5
This mass concatenation of pattern pieces into a final regex should be precomputed by the library instead of making consuming applications perform this work on every instantiation of the CrawlerDetect
object. That is, the fixture arrays should be transformed into the final regex string and output to a precomputed PHP file to save consumers from this unnecessary overhead. I suggest a PHP file because opcache can automatically cache the file for additional performance gains.
Example
patterns.php
<?php
return '/(.*Java.*outbrain| YLT|^b0t$|^bluefish ...)/i';
Again, I would like to see some benchmark figures to see what kind of improvement could be gained here.
Concatenating the regex into one precompiled regex string would also not allow us to easily test regex collisions...
Crawler-Detect/tests/UATests.php
Lines 112 to 126 in ddfeeed
Concatenating the regex into one precompiled regex string would also not allow us to easily test regex collisions...
That's simply not true because the precomputed list cannot be maintained if the original list is deleted. Ergo the original array still exists and can still be used in the test. They're not mutually exclusive.
An example would help here, it's not clear what you mean