[wiz-cli] duplicate dataclass schemas should be replaced with just one
rnag opened this issue · 0 comments
- Dataclass Wizard version: 0.22.1
- Python version: 3.10
- Operating System: Mac OS
Description
In certain cases - and especially in certain API responses, most notably for AWS Rekognition - the input JSON object can contain multiple definitions for the same field - for ex. "element"
, all of which contain an identical schema.
I'd like to eliminate those duplicate dataclass definitions in the output, so that the generated schema is a bit less verbose and we only have the data we care about.
For example, note the below sample input and output.
What I Did
I ran the following command from my mac terminal:
echo '{
"element": {
"my_str": "string",
"my_int": 3
},
"Elements": [
{
"my_str": "hello",
"my_int": 5
},
{
"myStr": "world",
"MyInt": 7
}
],
"other_field": {
"element": {
"my_str": "other string",
"my_int": 42
}
}
}' | wiz gs
The generated output is a bit noisy in this scenario, as it contains duplicate definitions of the dataclass Element
:
from dataclasses import dataclass
from typing import List
from dataclass_wizard import JSONWizard
@dataclass
class Data(JSONWizard):
"""
Data dataclass
"""
element: 'Element'
elements: List['Element']
other_field: 'OtherField'
@dataclass
class Element:
"""
Element dataclass
"""
my_str: str
my_int: int
@dataclass
class Element:
"""
Element dataclass
"""
my_str: str
my_int: int
@dataclass
class OtherField:
"""
OtherField dataclass
"""
element: 'Element'
@dataclass
class Element:
"""
Element dataclass
"""
my_str: str
my_int: int
I'd like to eliminate all the duplicate definitions - preferably trim any duplicates after the first dataclass schema for Element
.
Resolution
There are multiple ways to achieve this, but I think the easiest might be to store the generated string or __repr__
for the schema in a dict with the class name as the key, and then lookup and compare if those string defintions are the same. If so, we just continue and return an empty __repr__
after the first time. If not, we generate all the field names and types for the dataclass as normal.