
Combine XPath, CSS Selectors and JSONPath for Web data extracting.

Primary LanguagePythonMIT LicenseMIT

Data Extractor

license Pypi Status Python version Package version PyPI - Downloads GitHub last commit Code style: black Build Status codecov Documentation Status PDM managed

Combine XPath, CSS Selectors and JSONPath for Web data extracting.



Install the stable version from PYPI.

pip install "data-extractor[jsonpath-extractor]"  # for extracting JSON data
pip install "data-extractor[lxml]"  # for extracting HTML data

Or install the latest version from Github.

pip install "data-extractor[jsonpath-extractor] @ git+https://github.com/linw1995/data_extractor.git@master"

Extract JSON data

Currently supports to extract JSON data with below optional dependencies

install one dependency of them to extract JSON data.

Extract HTML(XML) data

Currently supports to extract HTML(XML) data with below optional dependencies


from data_extractor import Field, Item, JSONExtractor

class Count(Item):
    followings = Field(JSONExtractor("countFollowings"))
    fans = Field(JSONExtractor("countFans"))

class User(Item):
    name_ = Field(JSONExtractor("name"), name="name")
    age = Field(JSONExtractor("age"), default=17)
    count = Count()

assert User(JSONExtractor("data.users[*]"), is_many=True).extract(
        "data": {
            "users": [
                    "name": "john",
                    "age": 19,
                    "countFollowings": 14,
                    "countFans": 212,
                    "name": "jack",
                    "description": "",
                    "countFollowings": 54,
                    "countFans": 312,
) == [
    {"name": "john", "age": 19, "count": {"followings": 14, "fans": 212}},
    {"name": "jack", "age": 17, "count": {"followings": 54, "fans": 312}},




  • Supports Python 3.13


Environment Setup

Clone the source codes from Github.

git clone https://github.com/linw1995/data_extractor.git
cd data_extractor

Setup the development environment. Please make sure you install the pdm, pre-commit and nox CLIs in your environment.

make init
make PYTHON=3.7 init  # for specific python version


Use pre-commit for installing linters to ensure a good code style.

make pre-commit

Run linters. Some linters run via CLI nox, so make sure you install it.

make check-all


Run quick tests.


Run quick tests with verbose.

make vtest

Run tests with coverage. Testing in multiple Python environments is powered by CLI nox.

make cov