Scrapy Inline Requests

A decorator for writing coroutine-like spider callbacks.

Requires Scrapy>=1.0 and supports Python 2.7+ and 3.4+.

Free software: MIT license
Documentation: https://scrapy-inline-requests.readthedocs.org.

Usage

The spider below shows a simple use case of scraping a page and following a few links:

from scrapy import Spider, Request
from inline_requests import inline_requests

class MySpider(Spider):
    name = 'myspider'
    start_urls = ['http://httpbin.org/html']

    @inline_requests
    def parse(self, response):
        urls = [response.url]
        for i in range(10):
            next_resp = yield Request(response.urljoin('?page=%d' % i))
            urls.append(next_resp.url)
        yield {'urls': urls}

See the examples/ directory for a more complex spider.

Known Issues

Middlewares can drop or ignore non-200 status responses causing the callback to not continue its execution. This can be overcome by using the flag handle_httpstatus_all. See the httperror middleware documentation.
High concurrency and large responses can cause higher memory usage.
This decorator assumes your method have the following signature (self, response).
The decorated method must return a generator instance.

Veterun/scrapy-inline-requests

Scrapy Inline Requests

Usage

Known Issues