Durable Semantic Data Definitions
We avoid serialization and versioning hell accross applications and over time by completely decoupling data semantics from applications and file formats.
The main idea is to define a standalone set of semantic data definitions on top of primitive data types. Each definition is uniquely identified by a never-changing (durable) globally unique id.
Primitives
Primitives are definitions that DO NOT depend upon other definitions.
For example, the definition of the signed 2-complement 32-bit little-endian integer type looks as follows
"5ce108a4-a578-4edb-841d-068393ed93bf": {
"name": "Int32",
"description": "Signed 32-bit integer. 2-complement. Little endian."
}
, where 5ce108a4-a578-4edb-841d-068393ed93bf
is the definition's globally unique id,
name
is its friendly name, and description
specifies associated semantics.
A globally unique id is defined as
"a81a39b0-8f61-4efc-b0ce-27e2c5d3199d": {
"name": "Guid",
"description": "Globally unique identifier (GUID, 16 bytes). https://tools.ietf.org/html/rfc4122."
}
Serialization
Primitive values are serialized AS IS, without prepending the unique id.
Arrays
Arrays are defined by simply appending []
to the friendly name, e.g.
"1cfa6f68-5b56-44a7-b4b5-bd675bc910ab": {
"name": "Int32[]",
"description": "Array of signed 32-bit integers. 2-complement. Little endian."
}
, which states that type Int32[]
(1cfa6f68-5b56-44a7-b4b5-bd675bc910ab
) represents an array of values of type Int32
(5ce108a4-a578-4edb-841d-068393ed93bf
, see above).
Serialization
A binary serialization of an array starts with the length of the array given as an Int32 value (5ce108a4-a578-4edb-841d-068393ed93bf
), followed by as many elements.
DurableMap
A DurableMap
is specified as
"f03716ef-6c9e-4201-bf19-e0cabc6c6a9a": {
"name": "DurableMap",
"description": "A map of key/value pairs, where keys are durable IDs with values of corresponding types."
}
Serialization
A binary serialization of a DurableMap
starts with the number of entries given as an Int32 value (5ce108a4-a578-4edb-841d-068393ed93bf
), followed by as many key/value pairs, where each key is serialized as a 16-bytes unique id (a81a39b0-8f61-4efc-b0ce-27e2c5d3199d
) and each value is serialized according to its definition.
Structures
If a definition is composed of other definitions, this can be defined with a struct
entry.
"ad8adcb6-8cf1-474e-99da-851343858935": {
"name": "V3f",
"description": "A 3-dimensional vector of 32-bit floats.",
"layout": {
"X": "Float32",
"Y": "Float32",
"Z": "Float32"
}
}
Serialization
The order of entries in the layout definition matters. If used for binary (de)serialization, then it exactly specifies the data layout. There is no implied or implicit padding. This means, that a binary serialization of above vector consists of exactly 12 bytes, with the first 4 bytes containing a little-endian 32-bit floating point value (23fb286f-663b-4c71-9923-7e51c500f4ed
), followed by 4 bytes for the y-coordinate and 4 bytes for the z-coordinate.
Semantic Definitions
The following definition gives semantic meaning to an array of 3-dim vectors of 32-bit floats:
"712d0a0c-a8d0-42d1-bfc7-77eac2e4a755": {
"name": "Octree.Normals3f",
"description": "Octree. Per-point normals (V3f[]).",
"type": "V3f[]"
}
, where type
specifies the underlying (primitive) definition.
Serialization
Semantic values (values with a type
field) are serialized by first writing the 16 bytes unique id, followed by the actual value specified in field type
(which may itself be a semantic value, recursively).
Rules
- a definition NEVER changes
- should there be the need to change a definition (a.k.a. versioning), then a new definition with a different unique id is created
Content
Definitions are contained in definitions.json
, which can be used for code generation and (de)serialization.