Wrong length-check in summary when using xpath=True results wrong summaries

When trying to use xpath=True in summary to extract the main content, you get the wrong result for several webpages, otherwise its correct.

The reason is that the length check in the summary function gets done on the html including the xpath attributes. This should not be the case. This gives different results when using xpath vs. not using it and also implicitly defines a different len threshold for selecting the summary.

python-readability/readability/readability.py

Line 254 in e4a699b

article_length = len(cleaned_article or "")

One idea might be: add the xpath attributes to the html at the end after all calculations have been done rather in the beginning:

python-readability/readability/readability.py

Line 150 in e4a699b

if self.xpath:

best,
Thomas