DOM Cleaner: mb_eregi_replace errors out with retry-limit-in-match
half0wl opened this issue · 2 comments
Reproduction:
>>> use PHPHtmlParser\Dom;
>>> $dom = new Dom;
>>> $dom->loadFromUrl("https://casper.com/gifts/?clickid=T02U6OVQYxyLUbdwUx0Mo36dUkB1HNWwiSMnwQ0");
Throws:
PHP Warning: mb_eregi_replace(): mbregex search failure in php_mbereg_replace_exec(): retry-limit-in-match
over in <stripped>/paquettg/php-html-parser/src/PHPHtmlParser/Dom/Cleaner.php on line 81
PHPHtmlParser\Exceptions\LogicalException with message 'mb_eregi_replace returned false instead of a string.
Error when attempting to remove scripts 2.'
I've tried ini_set("pcre.backtrack_limit", "10000000000")
after some Googlefu on the error, but it doesn't work.
I can reproduce this on pages with huge <script></script>
tags, typically when there's a giant blob of JSON object in it.
I have the exact same problem but with a different URL. I quick-fixed it by disabling script removal from the HTML with $dom->setOptions((new Options())->setRemoveScripts(false));
but I would rather have a real fix for this, especially because there's a warning that keeping script tags could have unforeseen consequences.
Any help on this issue please @paquettg ?
Ok, I've fixed it without disabling tag removal by increasing the mb retry limit to 10 million. The self-documented php.ini describes this:
; This directive specifies maximum retry count for mbstring regular expressions. It is similar
; to the pcre.backtrack_limit for PCRE.
; Default: 1000000
;mbstring.regex_retry_limit=1000000
so I've used
ini_set("mbstring.regex_retry_limit", "10000000");
and all works fine on this front now