Warning
Current status is "experimental".
scrapy-poet
implements Page Object pattern for Scrapy.
License is BSD 3-clause.
pip install scrapy-poet
scrapy-poet requires Python >= 3.6 and Scrapy 2.1.0+.
First, enable middleware in your settings.py:
DOWNLOADER_MIDDLEWARES = { 'scrapy_poet.InjectionMiddleware': 543, }
After that you can write spiders which use page object pattern to separate extraction code from a spider:
import scrapy
from web_poet.pages import WebPage
class BookPage(WebPage):
def to_item(self):
return {
'url': self.url,
'name': self.css("title::text").get(),
}
class BooksSpider(scrapy.Spider):
name = 'books'
start_urls = ['http://books.toscrape.com/']
def parse(self, response):
links = response.css('.image_container a')
yield from response.follow_all(links, self.parse_book)
def parse_book(self, response, book_page: BookPage):
yield book_page.to_item()
TODO: document motivation, the rest of the features, provide more usage examples, explain shortcuts, etc. For now, please check spiders in "example" folder: https://github.com/scrapinghub/scrapy-poet/tree/master/example/example/spiders
- Source code: https://github.com/scrapinghub/scrapy-poet
- Issue tracker: https://github.com/scrapinghub/scrapy-poet/issues
Use tox to run tests with different Python versions:
tox
The command above also runs type checks; we use mypy.