andreskrey/readability.php

PHP Warning: Division by zero (because $topCandidate->contentScore is zero)

Closed this issue · 4 comments

blat commented

Hello!

Here is my sample code:

$url = 'https://france.googleblog.com/2018/05/google-celebre-loeuvre-du-realisateur.html';
$html = file_get_contents($url);
$config = new Configuration;
$config->setSummonCthulhu(true);
$readability = new Readability($config);
$readability->parse($html);

As result, several PHP warnings:

PHP Warning: Division by zero in vendor/andreskrey/readability.php/src/Readability.php on line 986

Same issue with those URLs:

In some cases, disabling SummonCthulhu solve the problem.

Thanks for your help!

Thanks for all the examples, I'll check it out now.

This one was interesting. All those articles are extremely thin in content and because of cases like this one I would like to port the IsProbablyReaderable function to discard them quickly instead of going to the full algorithm.

Anyway, this also happens in the JS version but apparently in javascript diving by zero returns NaN without any notice. I added a small safecheck plus a test case.

Fixed via 4359f3c

BTW all your test cases pass with this change, although parsing is not perfect in some of them.

blat commented

Thank you very much @andreskrey!