Wrong length-check in summary when using xpath=True results wrong summaries
yeus opened this issue · 0 comments
When trying to use xpath=True in summary to extract the main content, you get the wrong result for several webpages, otherwise its correct.
The reason is that the length check in the summary function gets done on the html including the xpath attributes. This should not be the case. This gives different results when using xpath vs. not using it and also implicitly defines a different len threshold for selecting the summary.
python-readability/readability/readability.py
Line 254 in e4a699b
One idea might be: add the xpath attributes to the html at the end after all calculations have been done rather in the beginning:
python-readability/readability/readability.py
Line 150 in e4a699b
best,
Thomas