Support item definitions using dataclasses
BurnzZ opened this issue · 0 comments
BurnzZ commented
Currently, using JSON Schema items with dataclasses (or attrs) doesn't work. Here's quick exampe:
from dataclasses import dataclass
from typing import Optional
from scrapy_jsonschema.item import JsonSchemaItem
class BookSchemaItem(JsonSchemaItem):
jsonschema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Book",
"description": "A Book item extracted from books.toscrape.com",
"type": "object",
"properties": {
"url": {
"description": "Book's URL",
"type": "string",
"pattern": "^https?://[\\S]+$"
},
"title": {
"description": "Book's title",
"type": "string"
}
},
"required": ["url"]
}
@dataclass
class BookItem(BookSchemaItem):
url: str
title: Optional[str] = None
It's mostly because of how scrapy-jsonschema tries to define the fields in the item via https://github.com/scrapy-plugins/scrapy-jsonschema/blob/master/scrapy_jsonschema/item.py#L77-L79 which is different from how dataclasses
and attrs
create the fields.
We should have a better way of defining the JSON Schema inside dataclasses by re-writing some portions of the library.