/selectolax

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

Primary LanguageCythonMIT LicenseMIT

selectolax logo


image

A fast HTML5 parser with CSS selectors using Modest and Lexbor engines.

Installation

From PyPI using pip:

Development version from GitHub:

How to compile selectolax while developing:

Basic examples

Available backends

Selectolax supports two backends: Modest and Lexbor. By default, all examples use the Modest backend. Most of the features between backends are almost identical, but there are still some differences.

Currently, the Lexbor backend is in beta and missing some of the features.

To use lexbor, just import the parser and use it in the similar way to the HTMLParser.

Simple Benchmark

  • Extract title, links, scripts and a meta tag from main pages of top 754 domains. See examples/benchmark.py for more information.
Package Time
Beautiful Soup (html.parser)

61.02 sec.

lxml

9.09 sec.

html5_parser

16.10 sec.

selectolax (Modest)

2.94 sec.

selectolax (Lexbor)

2.39 sec.

Links

License