html to etree

Parse html to lxml etree

Convenience methods for parsing html documents to lxml etree.

Lxml has limited capabilities for handling different encodings, and this library is intended as a reusable utility parsing byte-code html responses into ElementTrees using sane character decoding.

Free software: BSD license
Python versions: 2.7, 3.4+

Features

Parse html to lxml etree
Handle character decoding

Quickstart

Parse HTML given as byte strings:

tree = parse_html_bytes(body=body_bytes, content_type=res.headers.get('content-type'))

Parse HTML given as already decoded unicode string:

tree = parse_html_unicode(uni_string=body_unicode)

Credits

This package was created with Cookiecutter and the `fluquid/cookiecutter-pypackage`_ project template.

PythonLinks/html-to-etree

html to etree

Features

Quickstart

Credits