Spider

The fastest web crawler and indexer. Foundational building blocks for data curation workloads.

Concurrent
Streaming
Decentralization
Headless Chrome Rendering
HTTP Proxies
Cron Jobs
Subscriptions
Smart Mode
Blacklisting and Budgeting Depth
Changelog

Getting Started

The simplest way to get started is to use the Spider Cloud for a pain free hosted service. View the spider or spider_cli directory for local installations. You can also use spider with Node.js using spider-nodejs and Python using spider-py.

Benchmarks

See BENCHMARKS.

Examples

See EXAMPLES.

License

This project is licensed under the MIT license.

Contributing