lidatong/dataclasses-json

Dataclasses containing variables that reference themselves in list/dict fail to re-create as original type

Closed this issue ยท 13 comments

Description

When a dataclass contains a reference to a list/dict of itself in a variable type, converting an object to dict and from dict back to json results in a dictionary in the inner self-typed field instead of the self type.

Code snippet that reproduces the issue

from dataclasses import dataclass
from dataclasses_json import dataclass_json


@dataclass_json
@dataclass
class SpecialLinkedList:
    val: ...
    nexts: list['SpecialLinkedList'] = None



my_list = SpecialLinkedList(val=1, nexts=[SpecialLinkedList(val=2)])

print(my_list == SpecialLinkedList.from_dict(my_list.to_dict()))  # False

Describe the results you expected

The code snippet above outputs False.
values:

SpecialLinkedList.from_dict(my_list.to_dict()) == SpecialLinkedList(val=1, nexts={'1': {'val': 2, 'nexts': None}})
my_list == SpecialLinkedList(val=1, nexts={'1': SpecialLinkedList(val=2, nexts=None)})

Python version you are using

3.10

Environment description

clean project, only dataclass and dataclasses-json.

When I do

SpecialLinkedList.from_dict(my_list.to_dict())

/lib/python3.11/site-packages/dataclasses_json/core.py:184: RuntimeWarning: `NoneType` object value of non-optional type nexts detected when decoding SpecialLinkedList.
  warnings.warn(

SpecialLinkedList(val=1, nexts=[SpecialLinkedList(val=2, nexts=None)])

Which is correct? In your snippet, you are comparing class references, which will always output false since those are different instances.

Which is correct? In your snippet, you are comparing class references, which will always output false since those are different instances.

Yeah, of course. In my example its more of a pseudo code comparison. If you look at the result value of SpecialLinkedList.from_dict(my_list.to_dict()) you'll see that the nexts attribute points to a dict instead of a SpecialLinkedList as expected and as type hinted in the SpecialLinkedList class.
Thats the problem.
the inner self reference is not being converted to self, rather it remains a dict.

On a second look at your output it seems the bug is not reproducing, do you actually get nexts as a list of SpecialLinkedLists?
Just tried it again and the bug still replicates for me.

EDIT
I just created another clean environment to test this out.
for reference I'm using python3.10
This if my pip3 freeze output:

dataclasses-json==0.5.14
marshmallow==3.20.1
mypy-extensions==1.0.0
packaging==23.1
typing-inspect==0.9.0
typing_extensions==4.7.1

The bug replicates

Interesting! I tested on 3.11. I will re-test using your env as described and circle back here.

Interesting! I tested on 3.11. I will re-test using your env as described and circle back here.

Hey! ๐Ÿ˜„
any updates?

hi @NiroHaim
Sorry we have a bit of backlog, but I have this on my list and will look into it hopefully this week, worst case next week :)

Hey! Any news on this?

hi @NiroHaimo not yet, but it is on the todo list. Sorry to keep you waiting, but all team members are a bit swamped past 2 months with both internal and OSS contributions, plus our ability to release to PyPI is severly impaired until Github fixes env protection in October. My current expectation is I'll be able to send PR/identify the issue, but the fix will see actual release around October :(

So confirmed on 3.10 behaviour is different:

from dataclasses import dataclass
from dataclasses_json import dataclass_json
@dataclass_json
@dataclass
class SpecialLinkedList:
    val: int
    nexts: list['SpecialLinkedList'] = None
my_list = SpecialLinkedList(val=1, nexts=[SpecialLinkedList(val=2)])

print(my_list)

# SpecialLinkedList(val=1, nexts=[SpecialLinkedList(val=2, nexts=None)])
sys.version_info
# sys.version_info(major=3, minor=10, micro=12, releaselevel='final', serial=0)

SpecialLinkedList.from_dict(my_list.to_dict())
# SpecialLinkedList(val=1, nexts=[{'val': 2, 'nexts': None}])

Issue is that in 3.10 self-reference hint is a string, lol, which causes this method to fail

def _decode_items(type_args, xs, infer_missing):
    """
    This is a tricky situation where we need to check both the annotated
    type info (which is usually a type from `typing`) and check the
    value's type directly using `type()`.

    If the type_arg is a generic we can use the annotated type, but if the
    type_arg is a typevar we need to extract the reified type information
    hence the check of `is_dataclass(vs)`
    """
    def _decode_item(type_arg, x):
        if is_dataclass(type_arg) or is_dataclass(xs):
            return _decode_dataclass(type_arg, x, infer_missing)
        if _is_supported_generic(type_arg):
            return _decode_generic(type_arg, x, infer_missing)
        return x

    if _isinstance_safe(type_args, Collection) and not _issubclass_safe(type_args, Enum):
        return list(_decode_item(type_arg, x) for type_arg, x in zip(type_args, xs))
    return list(_decode_item(type_args, x) for x in xs)
  • python3.10
print(type_args, type(type_args))
SpecialLinkedList <class 'str'>
  • python3.11
print(type_args, type(type_args))
<class '__main__.SpecialLinkedList'> <class 'type'>

This is the reason https://peps.python.org/pep-0673/ - in 3.11 they finally added proper self type

Linked a PR to fix this, will finalize a bit later

@NiroHaim please take a look at the linked PR - should fix this issue