A decorator for writing coroutine-like spider callbacks.
Requires Scrapy>=1.0
and supports Python 2.7+ and 3.4+.
- Free software: MIT license
- Documentation: https://scrapy-inline-requests.readthedocs.org.
The spider below shows a simple use case of scraping a page and following a few links:
from scrapy import Spider, Request
from inline_requests import inline_requests
class MySpider(Spider):
name = 'myspider'
start_urls = ['http://httpbin.org/html']
@inline_requests
def parse(self, response):
urls = [response.url]
for i in range(10):
next_resp = yield Request(response.urljoin('?page=%d' % i))
urls.append(next_resp.url)
yield {'urls': urls}
See the examples/
directory for a more complex spider.
- Middlewares can drop or ignore non-200 status responses causing the callback
to not continue its execution. This can be overcome by using the flag
handle_httpstatus_all
. See the httperror middleware documentation. - High concurrency and large responses can cause higher memory usage.
- This decorator assumes your method have the following signature
(self, response)
. - The decorated method must return a generator instance.