try to take div itemprop="articleBody" into account
Opened this issue · 0 comments
hecmec commented
Hello,
thanks for your module, it is working nicely.
I've had just a little issue with text extraction.
Your calculateBestNode() function doesn't take div or article into account and it will not check for schema.org itemprop="articleBody". But nodes marked with this itemprop are pretty good candidates.
Example:
http://www.lemonde.fr/election-presidentielle-2017/article/2016/12/02/et-hollande-renonca-a-se-representer_5042285_4854003.html
Your module extracts the parent.parent of the article and so takes the content-menu as text.
Thanks
Hector