scrapy-plugins/scrapy-jsonschema

Drop non-conforming fields instead of whole items

JakeCowton opened this issue · 3 comments

It seems like it would make a lot more sense if only fields that don't match the schema are dropped instead of the entire item, or at least configurable to act that way.

If the dropping of a field causes the item to not have the required fields, then drop the entire item.

Any thoughts on this approach?

Just for reference, I'm happy to do a PR myself, just wanted to see if this would be desired functionality.

Hi @JakeCowton , thanks for the suggestion!

Yeah, I can see myself using this new feature in certain instances.

Feel free to submit a high-level overview of your proposal and we'd be happy to review. :)

I ended up rewriting my own implementation for this using a pipeline job & pydantic as it became a requirement for a dependency I have. I doubt there's appetite to switch the underlying schema structure to pydantic instead of raw JSON, so I'm closing this ticket.