Rediscuss about validate on assignment
Opened this issue · 4 comments
Currently, we have Pydantic set to validate property values on assignment. This has generally worked fine, although it imposes extra requirements on users (e.g you must supply a label
value in the arguments of a new instance of a Manifest
class, rather than being able to set it later).
With the work for #144, the new validation on Canvas
to ensure the existence of height
& width
and/or duration
broke a lot of tests which were relying on the previous behaviour of those fields being Optional
(itself an issue with how datamodel-code-generator interprets the constraints in the JSON Schema) and thus not needing to be set at instantiation.
Example of broken tests: https://github.com/iiif-prezi/iiif-prezi3/actions/runs/4380645190/jobs/7667931447#step:6:42
It also broke the behavior of some recipes/helpers:
Example of broken recipe: https://github.com/iiif-prezi/iiif-prezi3/blob/main/docs/recipes/scripts/0003-mvm-video-method1.py
This recipe relies on setting the HWD after the Canvas creation, originally because I wanted to use the same dictionary values for the resource and the canvas (slightly stymied, it turns out by IIIF/cookbook-recipes#374, but the idea stands 😅)
A broken helper would be create_canvas_from_iiif
, as this starts by instantiating a canvas object with kwargs, which wouldn't include HWD (because you're expecting to get them from the image service) and then uses set_hwd
to set them afterwards.
We have several approaches available to us:
- Keep the behaviour as in the validator POC - this means
a) The addition of HWD validators becomes a breaking change because it could cause existing code to stop working (the fact that the output JSON could be invalid is a slightly different matter - of which more later) - we would (probably) need to move to version 2.0, although it's not a "change in the public interface" so maybe not. Wahey semver discussions!
b) We need to update the existing tests and helpers, potentially involving a slightly gnarly hack forcreate_canvas_from_iiif
of setting fake values first (although we do this already forid
) - Turn off
validate_assignment
in the Pydantic config - this would allow HWD to be set after instantiation
a) We would have to shift the validation to being called elsewhere in order to ensure we did not output invalid JSON, thejson
function itself seems a sensible place
b) We would need to rewrite the tests that currently expect aValidationError
to be raised on assignment (e.g https://github.com/iiif-prezi/iiif-prezi3/blob/main/tests/test_basic.py#L70-L81)
c) This could "break" existing implementations in as much as if they weren't setting HWD on a Canvas we would currently let them output JSON, and having post-hoc validation would stop that. I don't see this as breaking in quite the same way, as it's really a fix for a bug. Maybe this is semantics?
d) This would temporarily allow users to create objects with garbage data - although they wouldn't be able to output them, so maybe that is just their lookout? - Scrap the validators and leave the current behaviour on the library, allowing the users to output invalid JSON if they don't set HWD on a canvas
My personal position is we should adopt 1 or 2, but I can't decide which I prefer.
In favour of 1 - We already require other non-optional fields to be set at instantiation (even if we let the AutoHelpers deal with setting some of them and the user doesn't necessarily explicitly have to), so why should HWD be any different? Yes it's slightly inconvenient to change existing behaviour, but it's to fix a mistake (and we have also technically done this before because the latest skeleton introduced the regexp constraints of label languages, which was technically breaking)
In favour of 2 - Outputting the JSON is the only time the data actually needs to be correct, and saving validation until then might make the library slightly friendlier to work with / more flexible in allowing the user to use the workflow best for them - maybe they don't have the HWD (or label, or etc etc) available at object instantiation, but as long as they supply it before outputting JSON, why do we care?
To me I would favor consistency first and foremost in a library, so would lean towards #1. If this is how the library behaves in other circumstances, it feels like it would be confusing for this one instance to behave differently. I get that it's a "larger" change, but it also feels like we're fixing something that was broken (even though we didn't realize it until now) so it's also a necessary fix.
@iiif-prezi/prezi3-maintainers Any other thoughts on this? I'd like to do a bit more work on the library over the next week (in expectation that a few more people might look at it after the conference), so it would be good to get this resolved.
I agree with Hillel but in my opinion doing a post validation would be much more flexible and could also overcome some limitations of the JSON schema. I wanted to add a validation step before generating the JSON also to my library in particular for testing disjoint behaviours where you need to know where is your object to understand if the behaviour is correct.