/html-to-etree

convenience method for parsing html to lxml elementtree using sane character decoding

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

html to etree

Coverage Status Requirements Status

Parse html to lxml etree

Convenience methods for parsing html documents to lxml etree.

Lxml has limited capabilities for handling different encodings, and this library is intended as a reusable utility parsing byte-code html responses into ElementTrees using sane character decoding.

  • Free software: BSD license
  • Python versions: 2.7, 3.4+

Features

  • Parse html to lxml etree
  • Handle character decoding

Quickstart

Parse HTML given as byte strings:

tree = parse_html_bytes(body=body_bytes, content_type=res.headers.get('content-type'))

Parse HTML given as already decoded unicode string:

tree = parse_html_unicode(uni_string=body_unicode)

Credits

This package was created with Cookiecutter and the `fluquid/cookiecutter-pypackage`_ project template.