/selectolax

Python binding to Modest engine (fast HTML5 parser with CSS selectors).

Primary LanguagePythonMIT LicenseMIT

selectolax

image

A fast HTML5 parser and CSS selectors using Modest engine.

Installation

From PyPI using pip:

pip install selectolax 

Development version from github:

git clone --recursive  https://github.com/rushter/selectolax
cd selectolax
pip install -r requirements_dev.txt
python setup.py install

How to compile selectolax while developing:

make clean
make dev

Examples

from selectolax.parser import HTMLParser

html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>"
selector = "div > :nth-child(2n+1):not(:has(a))"

for node in HTMLParser(html).css(selector):
    print(node.attributes, node.text(), node.tag)
    print(node.parent.tag)
    print(node.html)

Simple Benchmark

  • Average of 10 experiments to parse and retrieve URLs from 800 Google SERP pages.
Package Time   Memory (peak)
selectolax 2.38 sec. 768.11 MB
lxml   18.67 sec. 769.21 MB

License