Optional fields in structs
aboodman opened this issue · 12 comments
Our lack of optional fields quickly leads to combinatorial explosion of types. If we have data like:
List(
Struct({a:int}),
Struct({b:int}),
Struct({a:int, b:int}),
)
... we infer the type as:
List<
Struct{a:int} |
Struct{b:int} |
Struct{a:int, b:int}
>
I think we should instead infer (conceptually):
List<Struct{a:?int, b:?int}>
I'm still not sure exactly how optional should work in Noms though. It could be a special concept (optional), or there could be a new type Nil
that just means no value is acceptable (so you'd actually see: List<Struct{a:int|nil, b:int|nil}>
.
Not sure what the tradeoff is.
I was wondering today again if there was a way to do this without being a serialization change.
I think conceptually, you could represent optional like this:
struct T {
foo: int|struct Nil{}
}
The nil struct could just be a convention. But I'm not sure if in practice this idea is compatible with our actual current serialization. @arv, what do you think is this workable?
That would work.
But it would not lead to smaller types unless we change makeCommitType or makeUnionType.
@kalman and i were talking about this again today.
Strawman:
- Establish a convention in noms format
7
thatT|struct Nil{}
means "optional". - Change type validation to know about this convention. E.g., given
struct Foo{bar: Number|Nil}
andf := types.NewStruct("Foo", {}}
,f
is special-cased to be valid instance ofFoo
.
That would work.
Here are some implementation ideas.
Only struct types need an optional bit. A struct value does not have a notion of optional. At this point all values carry a type. We will assert that there are no optional bits sets for values.
Currently we have two slices, one with names and one with types. We will add another one for optionality bit. This can be implemented as a bit set (https://godoc.org/xojoc.pw/bitset) or a slice of bool.
Places we need to update:
- encoding/decoding
- Is subtype
- Type simplification
- ngql
- marshal
- nomdl parser
- human encoded serialization (strawman use
fieldName?: type
) - OID (oid is going away so maybe we can ignore this?)
- ContainCommonSuperType
- WalkDifferentStructs (can now take type predicate and prune search)
The BR type functions ContainCommonSuperType (and one other, unless I miss-remember).
I think some of other changes coming will precipitate some bigger changes in the encoding (I've been thinking alot about it and plan to write a design doc), so I wouldn't spend alot of time on compactness (e.g. bitfield) with this right now. I'd humbly suggest alternating field name & optional flag (rather than two slices), but if you feel strongly, just go for it.
@rafael-atticlabs I've started working on this and the encoding we have now is string, type, string type, ...
so it will be string, type, bool, string, type, bool, ...
How we represent StructDesc is not really important (it is currently two slices and I might make it a single slice of structs)
I'm aware of some of the changes we have planned but I didn't want to hold off optional types at this point. The extra work is minimal at this point.
SGTM. Totally agree. Getting this done is blocking BR working properly.
The current API I'm using is
types.MakeStructType2("Name",
types.StructField{"x", BoolType, true},
types.StructField{Name: "y", Type: StringType, Optional: false},
)
The old MakeStructType
and MakeStructTypeFromFields
creates only required fields. My approach is to land with a strange name, update all usages to use MakeStructType2
and then rename MakeStructType2
to MakeStructType
and remove the other two.
- The question is what API we want?
- Can the slice and field type names be made clearer?
- I was thinking we would do the sorting of the fields