oxan/djangorestframework-dataclasses

Patching

Closed this issue · 6 comments

I'm trying to build an API for a large body of synthetic data. By synthetic I mean it's put together from a pile of models.

The approach I'm trying is:
0: Define dataclasses that describe the synthetic structure.
1: Assemble instances of those data classes with the calculated values.
2: Throw the whole lot at a DataclassSerializer.

To give you a sense of the size of what I'm messing with I have 12 dataclasses and there's about 40 points where they nest inside each other. Pretty formatted minimal output for one detail view is over 300 lines of JSON.

On the get side this is all going silky smooth. Thank you, that alone has saved me a ton of hassle.

Now I need to allow patching into this structure, the client is allowed to send any sub portion of the 300 lines as a patch, which means any subset of the underlying fields and subsequently leaving out large amounts of stuff that is not optional at the dataclass level.

What I've done that is working but a lil clunky is:
1: Assemble the dataclasses for the current state (same as 1 above).
2: Push that through the DataclassSerializer to get JSON (same as 2 above).
3: Apply the incoming JSON as a patch to the JSON from 2.
4: Feed that into the data argument of the DataclassSerializer. MySerializer(data=<JSON from 3>)
5: Get the updated dataclasses from the serializers validated data.
6: Compare the dataclasses from 1 with the dataclasses from 5 and action any differences.
7: If there were no errors return the serializer from 4 to the API.

A: Is there a better way I should be doing this?
Particularly it seems like I should be able to skip 2 and 3 and pass the JSON patch into the serializer at 4 along with the dataclasses from one and partial=True and get the same result, but doing that results in lots of missing field errors on the nested dataclasses.

B: Also, at 4 if I pass the dataclasses from 1 as instance= then it seems to ignore the values in data, but if I pass it as initial= then it works as expected, which is odd. Although with the setup above there's no particular need pass it at all.

C: I'm not sure how to define readonly fields through this mess, it's not critical as I only check the writable stuff at 6, but it'd be nice. I don't want to build out serializers for it all as that removes a lot of the value I get from DataclassSerializer.
The writable stuff is only in a couple of dataclasses, is there a way I could define serializers for them and then tell the top level serializer if you find a dataclass X anywhere in the structure, use the X serializer?

oxan commented

A: Yes, the proper solution is setting partial=True and passing the JSON patch, but as you encountered that doesn't work correctly right now.

For the "simple" case where it are just dataclasses directly nested fixing that shouldn't be too hard. However, if you also nest dataclasses into lists and/or dictionaries, that's problematic, because the user can't unambiguously express their intent. Example: a dataclass has a field with value [objectA, objectB] (with objectA and objectB instances of another dataclass), and the user submits a patch with value [modifiedObject]. Do we delete objectB from the list and patch objectA? But then how must deletion of objectA and modification of objectB be expressed? Or how to add an item to the list?

For this same reason DRF doesn't allow writable nested representations of models by default and requires you to explicitly write a update() and create() method. That same approach is already available with this serializer, by the way.

B: The value passed to initial is completely ignored by serializers (at least when serializing and deserializing), which is the case in DRF as well IIUC. I don't even think it's documented as an option for serializers.

C: Yeah, this is a known limitation of the current architecture. You can specify which fields are readonly through the extra kwargs mechanism, but if the majority of your fields are readonly that becomes ugly quickly.

The intention was that you could specify specific serializers to be used for types by adding them to the serializer_field_mapping property. That turned out to be tricky for dataclasses (but it still works for any other classes!), but you can achieve the same effect by overriding build_nested_field. Now that I'm thinking about it I can probably get the field mapping working as well...

oxan commented

Regarding A, if you're using v0.7 or v0.8, you can try to replace the _strip_empty_sentinels function in serializers.py line 113-125 with this (untested):

    def _strip_empty_sentinels(self, x: AnyT, default = None) -> AnyT:
        """
        Recursively strip the `empty` sentinel values from dataclasses, replacing them with their fields default value.
        """
        if dataclasses.is_dataclass(x) and not isinstance(x, type):
            values = {field.name: self._strip_empty_sentinels(getattr(x, field.name), getattr(default, field.name, None))
                      for field in dataclasses.fields(x)
                      if getattr(x, field.name) is not empty}
            return dataclasses.replace(default, **values) if default else type(x)(**values)
        if isinstance(x, list):
            return [self._strip_empty_sentinels(item) for item in x]
        if isinstance(x, dict):
            return {key: self._strip_empty_sentinels(value) for key, value in x.items()}
        return x

A: In my particular usecase I only have dataclasses and a couple of lists and I don't delete by leaving things out of lists but that's all specific to me and I do have a working solution.

B: Why does including the original object cause the updates to get lost? is it because they are nested?

C: Fixing the mapping would be very interesting.
Some of what I'm doing is generative, so it's driven by dynamic lists of fields which is an additional reason why making serializers for the whole thing would be unpleasant.

C1: Having an opt in alternative for writable fields would also be nice.

C2: I've no idea how it would work but declaring read/write at the dataclass level would also be nice.

oxan commented

B: I can't reproduce that:

>>> from dataclasses import dataclass
>>> from rest_framework_dataclasses.serializers import DataclassSerializer
>>> @dataclass
... class Nested:
...     field: str   
>>> @dataclass
... class Parent:
...     field: str
...     child: Nested
>>> inst = Parent(field='old parent', child=Nested(field='old child'))
>>> data = {'field': 'new parent', 'child': {'field': 'new child'}}
>>> ser = DataclassSerializer(dataclass=Parent, instance=inst, data=data)
>>> ser.is_valid()
True
>>> ser.validated_data
Parent(field='new parent', child=Nested(field='new child'))
>>> ser.save()
Parent(field='new parent', child=Nested(field='new child'))

Can you supply a minimal reproducer?

C1: While that's possible, I'm not too fond of it, as the whole serializer architecture of DRF we build upon assumes that readonly fields are explicitly specified and everything is writable by default. It might be cleaner if you just override build_typed_field to set the read_only flag unless otherwise specified.
C2: You can already mark fields as read-only on the dataclass level by marking them as final (I should document that).

oxan commented

I'll close this issue now, as I think all the problems are now solved:
A: Support for partial updates of nested dataclasses has landed with commit f360139. Partial updates of composite fields (lists and dicts) aren't be supported and will just do a complete overwrite of the field. That should solve your problem. For the reasoning given before I don't think this'll change.
B: I can't reproduce this, feel free to reopen if you can supply a reproducer.
C1: I won't implement this for reasons mentioned previously. It should be pretty easy to invert this behaviour in a subclass though.
C2: Documentation added in commit c1cec80.

Thank you, if I get some breathing room I will try and reproduce it, but I wouldn't count on it.