Implement strict de-serialization option
johnthagen opened this issue · 3 comments
Currently, if extra data fields are passed into a DataclassSerializer
instance that are not modeled by the underlying @dataclass
, they are silently ignored. This could hide subtle bugs in clients that think they are sending a particular field, but in fact the field is being ignored and they are not made aware of it.
For example:
from dataclasses import dataclass
from rest_framework_dataclasses.serializers import DataclassSerializer
@dataclass
class Person:
name: str
height: float
class PersonSerializer(DataclassSerializer[Person]):
class Meta:
dataclass = Person
p = PersonSerializer(
data={
"name": "Bruce",
"height": 6.2,
"im_batman": True, # Extra, invalid field
}
)
p.is_valid(raise_exception=True) # Be able to configure the DataclassSerializer to fail validation here
A partial implementation (does not work with nested DataclassSerializer
s):
class StrictDataclassSerializer(DataclassSerializer[T], Generic[T]):
def validate(self, attrs: dict[str, Any]) -> dict[str, Any]:
# Need to check for initial_data as nested serializers will not have this field.
if hasattr(self, "initial_data"):
unknown_keys = set(self.initial_data.keys()) - set(fields.keys())
if unknown_keys:
raise ValidationError(f"Unknown keys found: {unknown_keys}")
return attrs
A feature like this could be enabled in the Meta
:
class PersonSerializer(DataclassSerializer[Person]):
class Meta:
dataclass = Person
strict = True
This behaviour isn't caused by nor specific to the DataclassSerializer
, it's consistent with what a stock serializer does:
class TestSerializer(serializers.Serializer):
test = serializers.CharField()
ser = TestSerializer(data={'test': 'hello', 'additional': 'extra'})
ser.is_valid() # True
ser.validated_data # OrderedDict([('test', 'hello')])
I'm hesitant to implement such a strict mode, since it's a layering violation as it changes the behaviour of DRF's default serializers. The goal of DataclassSerializer
is to autogenerate serializer fields based on a dataclass definition (and turn the returned value into a dataclass), everything that's not directly related to that should be implemented elsewhere.
If you want this behaviour, it should also be fairly straightforward to implement in your own project: use a subclass of DataclassSerializer
with the added validate()
method. By setting or defining the serializer_dataclass_field
field on that subclass, it'll also work for nested dataclasses (see e.g. HyperlinkedDataclassSerializer
at the bottom of serializers.py
for an example).
@oxan I'm happy to call this out of scope for djangorestframework-dataclasses
. As you mention, this is how the default DRF works (I may disagree with DRF's decision, but it is how it works).
We use Pydantic in other parts of our project and they have a setting for this (this is the Pydantic v1 version):
class StrictBaseModel(BaseModel):
class Config:
# Don't allow extra parameters
extra = Extra.forbid
This has caught subtle bugs in the past.
But in the DRF world, this is working as expected. Please feel free to close this issue and thank you for the ideas on how to implement this in user-land!
Yeah, I can see why rejecting extra data might be useful (though I could also make an argument for the opposite), but I indeed lean towards calling it out of scope for this project.
While writing my previous comment I realized that the documentation doesn't have an explicit example on how to subclass DataclassSerializer
, which might be useful. I'll add that.