A curated list of awesome packages, articles, and other cool resources from the Scrapy community. Scrapy is a fast high-level web crawling & scraping framework for Python.
- Portia Visual scraping for Scrapy
- scrapy-redis Redis-based components for Scrapy.
-
scrapyscript Run a Scrapy spider programmatically from a script or a Celery task - no project required.
-
scrapyd A service daemon to run Scrapy spiders
-
scrapyd-client Command line client for Scrapyd server
-
python-scrapyd-api A Python wrapper for working with Scrapyd's API.
-
SpiderKeeper A scalable admin ui for spider service
-
scrapyrt HTTP server which provides API for scheduling Scrapy spiders and making requests with spiders.
-
scrapy-sentry Logs Scrapy exceptions into Sentry
-
scrapy-statsd-middleware Statsd integration middleware for scrapy
-
scrapy-jsonrpc An extension to control a running Scrapy web crawler via JSON-RPC
-
scrapy-fieldstats A Scrapy extension to log items coverage when the spider shuts down
-
HttpProxyMiddleware A middleware for scrapy. Used to change HTTP proxy from time to time.
-
scrapy-proxies Processes Scrapy requests using a random proxy from list to avoid IP ban and improve crawling speed.
-
scrapy-rotating-proxies Use multiple proxies with Scrapy
-
scrapy-random-useragent Scrapy Middleware to set a random User-Agent for every Request.
-
scrapy-fake-useragent Random User-Agent middleware based on fake-useragent
-
scrapy-crawlera Crawlera routes requests through a pool of IPs, throttling access by introducing delays and discarding IPs from the pool when they get banned from certain domains, or have other problems.
-
scrapy-elasticsearch A scrapy pipeline which send items to Elastic Search server
-
scrapy-mongodb MongoDB pipeline for Scrapy.
-
scrapy-s3pipeline Scrapy pipeline to store chunked items into AWS S3 bucket
-
scrapy-sqs-exporter Scrapy extension for outputting scraped items to an Amazon SQS instance
-
scrapy-kafka-export Scrapy extension which writes crawled items to Kafka
-
scrapy-rss-exporter An RSS exporter for Scrapy
- scrapy-splash Make Scrapy can understand Javascript
-
scrapy-djangoitem Scrapy extension to write scraped items using Django models
-
scrapy-deltafetch Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
-
scrapy-crawl-once This package provides a Scrapy middleware which allows to avoid re-crawling pages which were already downloaded in previous crawls.
-
scrapy-magicfields Scrapy middleware to add extra fields to items, like timestamp, response fields, spider attributes etc.
-
scrapy-pagestorage A scrapy extension to store requests and responses information in storage service.
-
Web Scraping in Python using Scrapy (with multiple examples)
-
Advanced Web Scraping: Bypassing "403 Forbidden," captchas, and more
- Scrapy: Powerful Web Scraping & Crawling with Python Online courses on Udemy