The fastest web crawler and indexer. Foundational building blocks for data curation workloads.
- Concurrent
- Streaming
- Decentralization
- Headless Chrome Rendering
- HTTP Proxies
- Cron Jobs
- Subscriptions
- Smart Mode
- Blacklisting and Budgeting Depth
- Changelog
The simplest way to get started is to use the Spider Cloud for a pain free hosted service. View the spider or spider_cli directory for local installations. You can also use spider with Node.js using spider-nodejs and Python using spider-py.
See BENCHMARKS.
See EXAMPLES.
This project is licensed under the MIT license.
See CONTRIBUTING.