CMUSTRUDEL/DIRTY

Serialization of Unions uses wrong type tag

jlacomis opened this issue · 0 comments

I received an email confused about the Void type in data that looked like a Union. It turns out there is a subtle bug in the serialization code here:

DIRTY/binary/dire_types.py

Lines 918 to 924 in f1f24f4

def _to_json(self) -> t.Dict[str, t.Any]:
return {
"T": 8,
"n": self.name,
"m": [m._to_json() for m in self.members],
"p": self.padding,
}

This caused Unions to be serialized with the Void meta-tag. During deserialization, these are treated as Void types:

DIRTY/binary/dire_types.py

Lines 1041 to 1065 in f1f24f4

@staticmethod
def read_metadata(d: t.Dict[str, t.Any]) -> "TypeLibCodec.CodecTypes":
classes: t.Dict[
t.Union[int, str],
t.Union[
t.Type["TypeLib"],
t.Type["TypeLib.EntryList"],
t.Type["TypeInfo"],
t.Type["UDT.Member"],
],
] = {
"E": TypeLib.EntryList,
0: TypeLib,
1: TypeInfo,
2: Array,
3: Pointer,
4: UDT.Field,
5: UDT.Padding,
6: Struct,
7: Union,
8: Void,
9: FunctionPointer,
10: Disappear,
}
return classes[d["T"]]._from_json(d)

The serialization bug is simple enough to fix, but this means that the current dataset has this specific bug. I will fix the current dataset, but if you've already downloaded the current one and/or don't want to wait, you'll have to modify the read_metadata method to condition on d having other fields, for example by replacing line 1065 in dire_types.py with this (untested) code:

if d["T"] == 8:
    if "m" in d:
        return Union._from_json(d)
    return Void._from_json(d)
return classes[d["T"]]._from_json(d)