.to_dict() and .to_json() attribute names are always camel cased

Question

.to_dict() and .to_json() attribute names are always camel cased

Opened this issue 2 years ago · 5 comments

Dataclass Wizard version: 0.22.1
Python version: 3.9
Operating System: Mac os X 12.4

Description

The documentation is confusing and notes that attributes names are returned as camel case from .to_json() regardless of the actual attribute name in the dataclass definition.
Why is this behaviour used and not just return the attributes as per the repr()?

This breaks many logic points as the returned attributes (from .to_dict and .to_json()) do not actually reflect the attribute names in the actual dataclass.

additional observations

I really like this module as it does an excellent job of handling Types that are not handled properly by the basic json module.
Unfortunately I would now have to write a serialization hook to return the attribute names back to their original name so my projects will function as expeceted
Or build a custom mixin (which I would prefer not to do as this module is already doing this) that uses simplejson for better type handling than the std json module

What I Did

example

from dataclasses import dataclass
from typing import Optional
from dataclass_wizard import JSONWizard

@dataclass
class Something(JSONWizard):
    user_id: Optional[str]
    access_token: str
    expires: int 
    some_type: str 

some = Something(
  expires=3600,
  some_type="hello",
  user_id="1235-1235",
  access_token="abcd-12345-hjgas-12365",
)

>>> print(f"class reps: {repr(some)}")
class reps: Something(user_id='1235-1235', access_token='abcd-12345-hjgas-12365', expires=3600, some_type='hello')
>>> print(f"class dict: {some.to_dict()}")
class dict: {'userId': '1235-1235', 'accessToken': 'abcd-12345-hjgas-12365', 'expires': 3600, 'someType': 'hello'}
>>> print(f"class json: {some.to_json()}")
class json: {"userId": "1235-1235", "accessToken": "abcd-12345-hjgas-12365", "expires": 3600, "someType": "hello"}
>>>

expected results

The repr(), .to_dict() & .to_json() should all have the same attribute names

Answer 1 · 2022-08-15T04:31:40.000Z

Hi @circulon, thanks for opening this issue. I was curious to know if this handy workaround that was posted earlier in another issue could work for you, at least in the meantime.

Also including it below, just for completeness.

from dataclass_wizard import JSONWizard, DumpMeta

class JSONSnakeWizard(JSONWizard):
    """Helper for JSONWizard that ensures dumping to JSON puts keys in snake_case"""
    def __init_subclass__(cls, str=True):
        """Method for binding child class to DumpMeta"""
        super().__init_subclass__(str)
        DumpMeta(key_transform='SNAKE').bind_to(cls)

Then the only other change would be to update code to subclass from JSONSnakeWizard instead:

class Something(JSONSnakeWizard):
    ...

Answer 2 · 2022-10-11T14:54:12.000Z

The documentation is confusing and notes that attributes names are returned as camel case from .to_json() regardless of the actual attribute name in the dataclass definition.

Why is this behaviour used and not just return the attributes as per the repr()?

This breaks many logic points as the returned attributes (from .to_dict and .to_json()) do not actually reflect the attribute names in the actual dataclass.

This is a very good point, and the short answer to that is that when I was originally designing this library, it just "made sense" at the time. I.e., when dealing with JSON (which stands for JavaScript object notation) it made sense at the time to use JS convention for key names, which ideally was camelCase instead of snake_case. Of course, I can now understand why that would be confusing when working in Python, where all attribute names are snake-cased by convention.

So, just adding a note, but the plan is that in the next major release (still TBD) this case will likely be addressed. Ie, Attribute or key names will be returned "un-changed" as part of the dump process, by default. For example, if attribute or field names are snake-cased, they should also be similarly snake-cased in the JSON object returned when to_dict or to_json is called; if field names are camel-cased, they should similarly be retained as camel-cased in the JSON output.

I plan to add a milestone to track this, but I note however it will likely need to be implemented in a major version release (rather than a minor release) as this will be a "breaking" change so to speak. However, I definitely agree this is a good change to implement, also so that there is less confusion overall.

Answer 3 · 2024-11-27T20:22:47.000Z

I know it's been a while but it's 2024 and lot of changes have been made, and on the roadmap for V1 is to ensure no key transform in dump process.

Accordingly, I've added a Mixin class JSONPyWizard that does exactly this, and also added a note that this will be the default behavior in V1.

Answer 4 · 2024-11-27T20:25:13.000Z

Re-opening this issue because I do 100% understand where you're coming from. I've also had similar trouble lately, and realized that the design decision of camelCase was a perhaps ill-advised choice 😞 .

That said -- the year 2024 is winding down, and on the roadmap for V1 is to ensure no key transform in dump process.

Accordingly, I've added a Mixin class JSONPyWizard that does exactly this, and also added a note that this will be the default behavior in V1.

Feel free to follow my announcement on #153 to keep up-to-date on what's expected in V1. Thanks!

Answer 5 · 2024-11-27T23:53:10.000Z

@rnag
Thanks for reopening this, I ended up using something else...

In my case I nneded performance for serialization of complex nested dataclasses.
I found a performance enhancement for asdict() and astuple() in Python 3.12.
I based a gist dataclass_util.py on this for use with Python 3.11 with additional enhancements.
This reduced runtimes in my AWS Lambda functions considerably.

Cheers