/scrapy-inline-requests

A decorator to write coroutine-like spider callbacks.

Primary LanguagePythonMIT LicenseMIT

Scrapy Inline Requests

Documentation Status

A decorator for writing coroutine-like spider callbacks.

Requires Scrapy>=1.0 and supports Python 2.7+ and 3.4+.

Usage

The spider below shows a simple use case of scraping a page and following a few links:

from scrapy import Spider, Request
from inline_requests import inline_requests

class MySpider(Spider):
    name = 'myspider'
    start_urls = ['http://httpbin.org/html']

    @inline_requests
    def parse(self, response):
        urls = [response.url]
        for i in range(10):
            next_resp = yield Request(response.urljoin('?page=%d' % i))
            urls.append(next_resp.url)
        yield {'urls': urls}

See the examples/ directory for a more complex spider.

Known Issues

  • Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.
  • High concurrency and large responses can cause higher memory usage.
  • This decorator assumes your method have the following signature (self, response).
  • The decorated method must return a generator instance.