Documentation does not explain unstruct_collection_overrides' input
td-anne opened this issue · 4 comments
- cattrs version: 23.2.3
- Python version: 3.11.4
- Operating System: MacOS
Description
I am trying to unstructure sets into something that JSON can cope with and that is canonical (equal for any two equal sets; set iteration order can differ for equal sets). I tried unstruct_collection_overrides={Set: sorted}
and obtained a very confusing error ("< not supported between dict and dict"). It turns out that the problem is that the functions in unstruct_collection_overrides
receive a generator that emits the unstructured contents of the collection (at least for Set; I'm not sure for dicts). Since the objects in my set are frozen attrs objects, they themselves work fine in a Set (and being sorted), but once unstructured they become dictionaries, which can't be sorted.
[Edited to add: my conclusion is that this isn't really possible; that's not the bug, that's just an unfortunate limitation.]
The problem: the documentation at https://catt.rs/en/stable/unstructuring.html#customizing-collection-unstructuring never specifies what the unstructuring functions receive as input. Neither does the docstring for Converter (https://catt.rs/en/stable/cattrs.html#cattrs.Converter). A little explanation would have helped tremendously in tracking down the mysterious error.
What I Did
@frozen
class A:
x: int = 0
y: int = 0
def convert_set(s):
list_s = list(s)
print(f"{list_s=}")
return sorted(list_s)
converter = Converter(unstruct_collection_overrides={Set: convert_set})
converter.unstructure({A(x=1), A(x=2), A(x=3)})
gives
list_s=[{'x': 3, 'y': 0}, {'x': 1, 'y': 0}, {'x': 2, 'y': 0}]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[12], [line 15](vscode-notebook-cell:?execution_count=12&line=15)
[10](vscode-notebook-cell:?execution_count=12&line=10) return sorted(list_s)
[13](vscode-notebook-cell:?execution_count=12&line=13) converter = Converter(unstruct_collection_overrides={Set: convert_set})
---> [15](vscode-notebook-cell:?execution_count=12&line=15) converter.unstructure({A(x=1), A(x=2), A(x=3)})
File [~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:238](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:238), in BaseConverter.unstructure(self, obj, unstructure_as)
[237](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:237) def unstructure(self, obj: Any, unstructure_as: Any = None) -> Any:
--> [238](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:238) return self._unstructure_func.dispatch(
[239](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:239) obj.__class__ if unstructure_as is None else unstructure_as
[240](https://file+.vscode-resource.vscode-cdn.net/Users/annearchibald/projects/da-dropper-wire-data/analysis/~/Library/Caches/pypoetry/virtualenvs/da-dropper-wire-data-l5CQQYIb-py3.11/lib/python3.11/site-packages/cattrs/converters.py:240) )(obj)
File :2, in unstructure_iterable(iterable)
Cell In[12], [line 10](vscode-notebook-cell:?execution_count=12&line=10)
[8](vscode-notebook-cell:?execution_count=12&line=8) list_s = list(s)
[9](vscode-notebook-cell:?execution_count=12&line=9) print(f"{list_s=}")
---> [10](vscode-notebook-cell:?execution_count=12&line=10) return sorted(list_s)
TypeError: '<' not supported between instances of 'dict' and 'dict'
as opposed to some kind of canonically sorted list.
Howdy!
Sorry to hear you had issues with this. Documentation is always a work in progress. Would you care to submit a PR for the docs dealing with this, to help the next person? I'll gladly review and merge it.
my conclusion is that this isn't really possible
So what exactly are you trying to accomplish? Forcing a consistent ordering of sets of frozen attrs classes in json? I love a challenge.
If you're willing to make the attrs classes orderable, you might try something like this:
from typing import get_args
from attrs import AttrsInstance, frozen, has
from cattrs import Converter
from cattrs._compat import is_mutable_set
c = Converter()
@frozen(order=True)
class A:
x: int = 0
y: int = 0
def unstructure_sets(val: set[AttrsInstance]) -> list:
return [c.unstructure(v) for v in sorted(val)]
c.register_unstructure_hook_func(
lambda t: is_mutable_set(t) and has(get_args(t)[0]), unstructure_sets
)
print(c.unstructure({A(x=1), A(x=2), A(x=3)}, unstructure_as=set[A]))
(This can be made faster by adding some complexity.)
Aha! That is indeed what I was trying to do. I think the magic input here is the nearly undocumented unstructure_as
argument. Dropping it results in the set unstructuring seeing the contents as dictionaries.
I don't mind writing up a doc PR, but I don't think I understand how the unstructuring procedure works. Presumably it takes the tree of objects and unstructures from the leaves up? That is, the unstructuring procedure for collections is "unstructure all the contents, put them in a collection of the same kind, and then use unstruct_collection_overrides
to convert the new collection? How does this interact with unstructure_as
?
I think the following achieves what I was aiming for:
def unstructure_set(val: set) -> list:
vals = list(val)
try:
return [c.unstructure(v) for v in sorted(vals)]
except TypeError as e:
print(f"Whoops, can't sort: {e}")
return [c.unstructure(v) for v in vals]
c = Converter()
c.register_unstructure_hook_func(
lambda t: issubclass(t, set)
or issubclass(getattr(t, "__origin__", type(None)), set),
unstructure_set,
)
print(c.unstructure({1, 2, -1}))
@frozen(order=True)
class A:
x: int = 0
y: int = 0
print(c.unstructure({A(x=4), A(x=2), A(x=3)}))
print(c.unstructure({A(x=4), A(x=2), A(x=3)}, unstructure_as=set[A]))
@frozen
class B:
x: int = 0
y: int = 0
print(c.unstructure({B(x=4), B(x=2), B(x=3)}))
print(c.unstructure({B(x=4), B(x=2), B(x=3)}, unstructure_as=set[B]))
That is, c
unstructures set
as list
, sorted if possible but unsorted if the elements don't support ordering.
I don't think I can do this with unstruct_collection_overrides
because that gets the contents already unstructured. This solution also allows users to include unstructure_as
or not, as desired; if I weren't concerned about allowing unstructure_as
I could do this with a simpler register_unstructure_hook(set, unstructure_set)
.
Dicts are amenable to a similar approach, I think, but they can also be handled at the JSON level with sort_keys=True
. And if dicts are to be unstructured into a JSON-compatible way the keys all need to be converted to unique strings anyway, so those will always be sortable post-unstructuring. Thus for dicts I think we could get away with an unstruct_collection_overrides
without any additional requirement.
In practice I ran into another complexity: one of my objects has an Identifier
field, which I would like to be a union of three different (frozen(order=True)
) attrs
classes. I would like to be able to unstructure sets of identifiers into a sorted list. Unfortunately I haven't been able to get the different classes to compare to each other, even if I try making them subclasses of a common class. Thus I had to write an Identifier.key()
function that extracts the key for each; so a sorting unstructure function needs to know to call that function on them. Which I think means I need to go back to requiring users to specify unstructure_as
.
Which I think means I need to go back to requiring users to specify unstructure_as.
Are you wrapping cattrs in a framework of sorts? I've done that a lot and have almost always been able to know exactly what I was unstructuring, although that might not be the case for you unless you lean on type hinting a lot.
I played around with your example; instead of trying to sort and then dealing with a possible exception, I tried this:
c = Converter()
def unstructure_ordered_set(val: set) -> list:
return c.unstructure(sorted(val))
c.register_unstructure_hook_func(
lambda t: issubclass(getattr(t, "__origin__", type(None)), set)
and is_orderable(t.__args__[0]),
unstructure_ordered_set,
)
def is_orderable(type: type) -> bool:
return type.__lt__ != object.__lt__
Obviously it's not the same as yours (only works for orderable classes) but it seems to work ;)