FutureWarning: Use specific 'len(elem)' or 'elem is not None' test instead.

Question

FutureWarning: Use specific 'len(elem)' or 'elem is not None' test instead.

web64 opened this issue 7 years ago · 4 comments

Hi,

I'm getting this warning:

readability/htmls.py:117: FutureWarning: The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead.

I'm running Python 3.5.2

Cheers!

Answer 1 · 2018-11-09T18:41:55.000Z

Same here..
Any news on that?
What is the thing we have to correct?

Answer 2 · 2022-12-01T22:25:12.000Z

Appears to be the :

doc.body or doc

statement

Answer 3 · 2022-12-01T22:45:35.000Z

I actually was getting bad results, not just warnings (a string containing a repr of a byte buffer). Simple samples code did not have this, only with a real web page. Unclear if related (might warrant a new issue).

Ended up Monkey patching in a hack, still got warning but at least it worked:

from lxml.etree import tostring
import readability
from readability import Document  # https://github.com/buriy/python-readability/   pip install readability-lxml

## monkey patch

def get_body(doc):
    for elem in doc.xpath(".//script | .//link | .//style"):
        elem.drop_tree()
    # tostring() always return utf-8 encoded string
    # FIXME: isn't better to use tounicode?
    print('MY DEBUG')
    #raw_html = str_(tostring(doc.body or doc))
    #raw_html = tostring(doc.body or doc)
    raw_html = tostring(doc.body or doc, encoding='utf-8').decode('utf-8')
    #import pdb ; pdb.set_trace()
    #raw_html = doc.body or doc
    cleaned = readability.cleaners.clean_attributes(raw_html)
    try:
        # BeautifulSoup(cleaned) #FIXME do we really need to try loading it?
        return cleaned
    except Exception:  # FIXME find the equivalent lxml error
        # logging.error("cleansing broke html content: %s\n---------\n%s" % (raw_html, cleaned))
        return raw_html


def content(self):
    """Returns document body"""
    #return get_body(self._html(True))
    print('MY DEBUG')
    return get_body(self._html(True))

Document.content = content
## monkey patch

Answer 4 · 2023-01-06T04:28:28.000Z

I was using one line to validate the response of a tag