Spidermon fails to add `_validation` field to the item.
proway2 opened this issue · 10 comments
Ubuntu 22.04, Python 3.10.9, Spidermon 1.17.1
If an item defined with attrs
library and this condition https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/pipelines.py#L140 is True
, then it fails here https://github.com/scrapinghub/spidermon/blob/master/spidermon/contrib/scrapy/pipelines.py#L141 with this message:
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: 'ProductList' object does not support item assignment
This is a limitation on the Item definition as it don't support adding new fields.
In theory, using slots=False
should work according to the attr dosc, however it didn't work for me.
Alternatively, adding the field _validation
to the class should remove the error, however, you will get _validation
automatically populated to None on each item which is not the best solution either.
On my experience I had to just convert from Attrs class to dict
before spidermon validaton.
That class is just an interface to simplify read and write access to the different types of Item classes. It doesn't add any additional functionality.
I don't know if we could overcharge it to support additional fields not defined on the original item class.
Maybe we can start by capturing the exception and raising a warning instead.
Alternatively, adding the field
_validation
to the class should remove the error
I think this is the way to go. You either disable the creation of this _validation
field, if Spidermon allows for that, or you add the field to your item class.
@Gallaecio Shall we close this ticket as wont fix or similar?
Well, I think there might be an action here even if we go with #379 (comment).
- If the
_validation
field cannot be disabled without disabling validation itself (I don’t know if that is the case), then that’s something to support. - We should document this scenario, i.e. the need to add
_validation
to an item class that does not support defining unknown fields, and the option to disable the generation of that field instead.
This can be turn off with SPIDERMON_VALIDATION_ADD_ERRORS_TO_ITEMS
- https://spidermon.readthedocs.io/en/latest/item-validation.html#spidermon-validation-add-errors-to-items
We may expand the docs to explicitly state that if enabled, it will only work on items that have defined the _validation
field or that support adding undefined fields, like dicts
.
Edit: Additionally, SPIDERMON_VALIDATION_ERRORS_FIELD
can be use to replace the _validation
field with any other field name.
This problem was added by #358 and it is a backward incompatible change that we didn't inform our users that it was introduced. So we need to do something about it. Probably one new release that indicates that _validation
field will only work if we are dealing with dict
item.
We already have this situation in Scrapy related to files
and files_urls
field for the media pipelines (https://docs.scrapy.org/en/latest/topics/media-pipeline.html#usage-example). So adding the information that we need to specify _validation
should be enough. However, I believe that we should put as a warning, not just in the middle of the text.