[BUG] Nullable fields are not showing up when deserializing with field(default=None, metadata=config(exclude=lambda x: x is None))

Question

[BUG] Nullable fields are not showing up when deserializing with field(default=None, metadata=config(exclude=lambda x: x is None))

Closed this issue 9 months ago · 4 comments

Description

I'm not sure if I'm doing this correctly,
but my goal is to deserialize json data which has optional properties and when the optional properties are null,
have them not show up in the deserialized version of the data.
Let's say I have this code:

from dataclasses import dataclass, field
from typing import Optional
from dataclasses_json import dataclass_json, LetterCase, config

@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class NewImage:
    pk: str = field(metadata=config(field_name="PK"))
    sk: str = field(metadata=config(field_name="SK"))
    created_by: str
    created_date_time: str
    optional_attribute_1: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))
    optional_attribute_2: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))

So when I receive data that has optional_attribute_1 but doesn't have optional_attribute_2, it will deserialize without optional attributes.
I've looked at this issue, and that's how they say to ignore null values.

Code snippet that reproduces the issue

from dataclasses import dataclass, field
from typing import Optional
from dataclasses_json import dataclass_json, LetterCase, config


@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class NewImage:
    pk: str = field(metadata=config(field_name="PK"))
    sk: str = field(metadata=config(field_name="SK"))
    created_by: str
    created_date_time: str
    optional_attribute_1: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))
    optional_attribute_2: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))


# i convert my json data to dict before that (i have to)
new_image = {"pk": "1", "sk": "1", "created_by": "blah", "created_date_time": "today", "optional_attribute_1": "blah"}

print(NewImage.from_dict(new_image))  # this will not display optional_attribute_1

Expected

Expecting the deserialized object to have the optional attributes when they are present in serialized form.

NewImage(pk='1', sk='1', created_by='blah', created_date_time='today', optional_attribute_1='blah')

Actual

The optional_attribute_2=None is present.

NewImage(pk='1', sk='1', created_by='blah', created_date_time='today', optional_attribute_1='blah', optional_attribute_2=None)

Environment description

Python version: 3.11

Click to see packages

boto3==1.34.0
botocore==1.34.0
certifi==2023.11.17
charset-normalizer==3.3.2
dataclasses==0.6
dataclasses-json==0.6.1
dotenv==0.0.5
dynamodb-json==1.3
idna==3.6
jmespath==1.0.1
marshmallow==3.20.1
mypy-extensions==1.0.0
numpy==1.26.2
packaging==23.2
pandas==2.1.4
python-dateutil==2.8.2
python-dotenv==1.0.0
pytz==2023.3.post1
requests==2.31.0
s3transfer==0.9.0
simplejson==3.19.2
six==1.16.0
types-requests==2.31.0.10
typing-inspect==0.9.0
typing_extensions==4.8.0
tzdata==2023.3
urllib3==2.0.7

Answer 1 · 2023-12-17T19:06:28.000Z

Updated description: added expected/actual, code highlight, added imports, moved environment details under <details> tag.

Answer 2 · 2023-12-17T19:17:55.000Z

TL;DR

You are confusing the dataclasses and dataclasses_json functionality.

Long Read

@yakovsushenok even though I agree with your suggestion (that's the feature I also want to exist), you are misguided. The method you are calling is __repr__ from dataclasses package itself, not the one from dataclasses_json. The last controls only (de)serialization, and the dataclasses handle the rest. The extra parameter exclude controls only if the field should be present in the serialized data, and __repr__ will always print all fields with repr=True (enabled by default). You can actually override the __repr__ from the dataclasses package in your class if you want.

Example

Here, take a look:

from dataclasses import dataclass, field
from typing import Optional
from dataclasses_json import dataclass_json, LetterCase, config


@dataclass_json(letter_case=LetterCase.CAMEL)
@dataclass
class ReprTest:
    optional_exclude: Optional[str] = field(default=None, metadata=config(exclude=lambda x: x is None))
    optional_no_repr: Optional[str] = field(default=None, repr=False)

r1 = ReprTest()
r2 = ReprTest(optional_exclude='one', optional_no_repr='two')

print("FIRST:")
print(r1)
print(r1.to_json())
print()

print("SECOND:")
print(r2)
print(r2.to_json())

Output

FIRST:
ReprTest(optional_exclude=None)
{"optionalNoRepr": null}

SECOND:
ReprTest(optional_exclude='one')
{"optionalExclude": "one", "optionalNoRepr": "two"}

As you can see, __repr__ in both scenarios behaves the same way: always prints optional_exclude and does not do that for optional_no_repr.
However, in .to_json() optionalExclude is present on the once case and does not for the other while optionalNoRepr is always present.

Answer 3 · 2023-12-17T19:32:29.000Z

If you still want this behaviour, you can use this as a reference:

from abc import ABC
from dataclasses import dataclass, fields, field
from typing import *

@dataclass
class DataclassSmartRepr(ABC):
    def __repr__(self):
        tokens: List[str] = list()
        
        for f in fields(self):
            if (f.repr and (v := getattr(self, f.name, None)) is not None):
                tokens.append(f'{f.name}={v!r}')
        
        return f"{type(self).__name__}({', '.join(tokens)})"

@dataclass
class ReprTest:
    optional_with_repr_one: Optional[str] = field(default=None, repr=True)
    optional_with_repr_two: Optional[str] = field(default=None, repr=True)
    optional_no_repr: Optional[str] = field(default=None, repr=False)
    
    __repr__ = DataclassSmartRepr.__repr__

DataclassSmartRepr.register(ReprTest)

r1 = ReprTest()
r2 = ReprTest(optional_with_repr_one='one', optional_with_repr_two='two', optional_no_repr='three')

print("FIRST:")
print(r1)
print()

print("SECOND:")
print(r2)

Output:

FIRST:
ReprTest()

SECOND:
ReprTest(optional_with_repr_one='one', optional_with_repr_two='two')

Answer 4 · 2023-12-17T19:55:51.000Z

Thanks @USSX-Hares , I understand now.