FutureWarning: Use specific 'len(elem)' or 'elem is not None' test instead.
web64 opened this issue · 4 comments
web64 commented
Hi,
I'm getting this warning:
readability/htmls.py:117: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.
I'm running Python 3.5.2
Cheers!
noembryo commented
Same here..
Any news on that?
What is the thing we have to correct?
clach04 commented
Appears to be the :
doc.body or doc
statement
clach04 commented
I actually was getting bad results, not just warnings (a string containing a repr of a byte buffer). Simple samples code did not have this, only with a real web page. Unclear if related (might warrant a new issue).
Ended up Monkey patching in a hack, still got warning but at least it worked:
from lxml.etree import tostring
import readability
from readability import Document # https://github.com/buriy/python-readability/ pip install readability-lxml
## monkey patch
def get_body(doc):
for elem in doc.xpath(".//script | .//link | .//style"):
elem.drop_tree()
# tostring() always return utf-8 encoded string
# FIXME: isn't better to use tounicode?
print('MY DEBUG')
#raw_html = str_(tostring(doc.body or doc))
#raw_html = tostring(doc.body or doc)
raw_html = tostring(doc.body or doc, encoding='utf-8').decode('utf-8')
#import pdb ; pdb.set_trace()
#raw_html = doc.body or doc
cleaned = readability.cleaners.clean_attributes(raw_html)
try:
# BeautifulSoup(cleaned) #FIXME do we really need to try loading it?
return cleaned
except Exception: # FIXME find the equivalent lxml error
# logging.error("cleansing broke html content: %s\n---------\n%s" % (raw_html, cleaned))
return raw_html
def content(self):
"""Returns document body"""
#return get_body(self._html(True))
print('MY DEBUG')
return get_body(self._html(True))
Document.content = content
## monkey patch
Mustafahubs commented