PHP Warning: Division by zero (because $topCandidate->contentScore is zero)
Closed this issue · 4 comments
Hello!
Here is my sample code:
$url = 'https://france.googleblog.com/2018/05/google-celebre-loeuvre-du-realisateur.html';
$html = file_get_contents($url);
$config = new Configuration;
$config->setSummonCthulhu(true);
$readability = new Readability($config);
$readability->parse($html);
As result, several PHP warnings:
PHP Warning: Division by zero in vendor/andreskrey/readability.php/src/Readability.php on line 986
Same issue with those URLs:
- http://fr.ign.com/castle-rock/36537/trailer/castle-rock-une-grosse-bande-annonce
- http://juanmartorano.blogspot.fr/2018/05/discurso-de-nicolas-maduro-el-dia-del.html
- http://lecontainer.blogspot.fr/2018/04/ulf_16.html
- http://sk.ru/news/m/tenderdocs/21616.aspx
- http://www.fortressofsolitude.co.za/celine-dion-deadpool-2-soundtrack-ashes/
- http://www.hyundaiclubtr.com/konu/elektirik-sikintisi.42916/
- http://www.justjared.com/2018/05/03/sandra-bullocks-stalker-dead-after-5-hour-standoff-with-police/
- https://chromereleases.googleblog.com/2018/05/beta-channel-update-for-chrome-os.html?showComment=1525360703861
- https://cloudplatform.googleblog.com/2018/05/Exploring-container-security-Using-Cloud-Security-Comma.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ClPlBl+%28Cloud+Platform+Blog%29
- https://france.googleblog.com/2018/05/google-celebre-loeuvre-du-realisateur.html
- https://geektyrant.com/news/avengers-infinity-war-writers-explain-that-avengers-4-doesnt-do-what-you-think-it-does
- https://letterboxd.com/captstevezissou/film/valerian-and-the-city-of-a-thousand-planets/
- http://darkmarket.pw/threads/udostoverenija-i-svidetelstva-na-rabochie-specialnosti.5956/
- https://geektyrant.com/news/avengers-infinity-war-writers-explain-that-avengers-4-doesnt-do-what-you-think-it-does
In some cases, disabling SummonCthulhu
solve the problem.
Thanks for your help!
Thanks for all the examples, I'll check it out now.
This one was interesting. All those articles are extremely thin in content and because of cases like this one I would like to port the IsProbablyReaderable function to discard them quickly instead of going to the full algorithm.
Anyway, this also happens in the JS version but apparently in javascript diving by zero returns NaN without any notice. I added a small safecheck plus a test case.
Fixed via 4359f3c
BTW all your test cases pass with this change, although parsing is not perfect in some of them.
Thank you very much @andreskrey!