scrapy/itemadapter

An error is raised when dealing with dataclasses

ejulio opened this issue · 9 comments

I think there may be some issue in repr or similar that raises an exception if the dataclass is empty (no values set).
Here is a minimal example..

from dataclasses import dataclass
from itemadapter import ItemAdapter

@dataclass(init=False)
class Product:
    name: str
    price: float

p = Product()
values = ItemAdapter(p).items()  # error
list(values)
# []
print(values)  # raises an error

Seems like the issue comes directly from dataclasses. __repr__ will just return fields and values, I guess it probably expects a custom __init__ method if you pass init=False.

Python 3.7.4 (default, Aug  6 2019, 15:53:23) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dataclasses import dataclass
>>> @dataclass(init=False)
... class Product:
...     name: str
...     price: float
... 
>>> p = Product()
>>> p      
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/dataclasses.py", line 357, in wrapper
    result = user_function(self)
  File "<string>", line 2, in __repr__
AttributeError: 'Product' object has no attribute 'name'

I get the impression that dataclasses are not really meant to be completely functional without an __init__ method that sets the defined fields, either automatically created by the module of explicitly provided by the user. In that sense, it seems to me like providing default values is the way to go:

from dataclasses import dataclass, field

@dataclass
class Product:
    name: str = field(default_factory=list)
    price: float = field(default_factory=list)

Do you think we should catch the AttributeError internally?

Agreed @elacuesta
An I think catching the exception makes sense, to some extent

Two cents from @kmike: delay item creation until load_item() is called

@Gallaecio Should that last comment go in itemloaders instead?

Indeed 🤦

Should we just re-raise as a ValueError in scenarios where this causes an exception to raise?

This would require users to set all fields as optional (so they don't get the error).
I see this as a modelling issue, where I need to set something that is required as optional and it can hide problems.

I guess a proper solution might require us to change the API a bit and not expose the item until it is created (load_item), or something like that.
Then, I think it makes sense to raise ValueError if the item can't be properly created because a required field was not set.

Maybe, the item we expose is a plain dict and has the interpretation of an intermediary item

I guess a proper solution might require us to change the API a bit and not expose the item until it is created (load_item), or
something like that.

Note that load_item is only relevant for itemloaders. And yes, for itemloaders we should do that, and I plan to work on it.