scrapy-plugins/scrapy-jsonschema

Support item definitions using dataclasses

BurnzZ opened this issue · 0 comments

Currently, using JSON Schema items with dataclasses (or attrs) doesn't work. Here's quick exampe:

from dataclasses import dataclass
from typing import Optional

from scrapy_jsonschema.item import JsonSchemaItem 


class BookSchemaItem(JsonSchemaItem):
    jsonschema = {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "title": "Book",
        "description": "A Book item extracted from books.toscrape.com",
        "type": "object",
        "properties": {
            "url": {
                "description": "Book's URL",
                "type": "string",
                "pattern": "^https?://[\\S]+$"
            },
            "title": {
                "description": "Book's title",
                "type": "string"
            }
        },
        "required": ["url"]
    }


@dataclass
class BookItem(BookSchemaItem):
    url: str
    title: Optional[str] = None

It's mostly because of how scrapy-jsonschema tries to define the fields in the item via https://github.com/scrapy-plugins/scrapy-jsonschema/blob/master/scrapy_jsonschema/item.py#L77-L79 which is different from how dataclasses and attrs create the fields.

We should have a better way of defining the JSON Schema inside dataclasses by re-writing some portions of the library.