attic-labs/noms

Optional fields in structs

aboodman opened this issue · 12 comments

Our lack of optional fields quickly leads to combinatorial explosion of types. If we have data like:

List(
  Struct({a:int}),
  Struct({b:int}),
  Struct({a:int, b:int}),
)

... we infer the type as:

List<
  Struct{a:int} |
  Struct{b:int} |
  Struct{a:int, b:int}
>

I think we should instead infer (conceptually):

List<Struct{a:?int, b:?int}>

I'm still not sure exactly how optional should work in Noms though. It could be a special concept (optional), or there could be a new type Nil that just means no value is acceptable (so you'd actually see: List<Struct{a:int|nil, b:int|nil}>.

Not sure what the tradeoff is.

See also related #2326

I was wondering today again if there was a way to do this without being a serialization change.

I think conceptually, you could represent optional like this:

struct T {
  foo: int|struct Nil{}
}

The nil struct could just be a convention. But I'm not sure if in practice this idea is compatible with our actual current serialization. @arv, what do you think is this workable?

arv commented

That would work.

But it would not lead to smaller types unless we change makeCommitType or makeUnionType.

@kalman and i were talking about this again today.

Strawman:

  1. Establish a convention in noms format 7 that T|struct Nil{} means "optional".
  2. Change type validation to know about this convention. E.g., given struct Foo{bar: Number|Nil} and f := types.NewStruct("Foo", {}}, f is special-cased to be valid instance of Foo.

Discuss... @arv , @kalman

arv commented

That would work.

arv commented

Here are some implementation ideas.

Only struct types need an optional bit. A struct value does not have a notion of optional. At this point all values carry a type. We will assert that there are no optional bits sets for values.

Currently we have two slices, one with names and one with types. We will add another one for optionality bit. This can be implemented as a bit set (https://godoc.org/xojoc.pw/bitset) or a slice of bool.

Places we need to update:

  • encoding/decoding
  • Is subtype
  • Type simplification
  • ngql
  • marshal
  • nomdl parser
  • human encoded serialization (strawman use fieldName?: type)
  • OID (oid is going away so maybe we can ignore this?)
  • ContainCommonSuperType
  • WalkDifferentStructs (can now take type predicate and prune search)

The BR type functions ContainCommonSuperType (and one other, unless I miss-remember).

I think some of other changes coming will precipitate some bigger changes in the encoding (I've been thinking alot about it and plan to write a design doc), so I wouldn't spend alot of time on compactness (e.g. bitfield) with this right now. I'd humbly suggest alternating field name & optional flag (rather than two slices), but if you feel strongly, just go for it.

arv commented

@rafael-atticlabs I've started working on this and the encoding we have now is string, type, string type, ... so it will be string, type, bool, string, type, bool, ... How we represent StructDesc is not really important (it is currently two slices and I might make it a single slice of structs)

arv commented

I'm aware of some of the changes we have planned but I didn't want to hold off optional types at this point. The extra work is minimal at this point.

SGTM. Totally agree. Getting this done is blocking BR working properly.

arv commented

The current API I'm using is

types.MakeStructType2("Name", 
  types.StructField{"x", BoolType, true},
  types.StructField{Name: "y", Type: StringType, Optional: false},
)

The old MakeStructType and MakeStructTypeFromFields creates only required fields. My approach is to land with a strange name, update all usages to use MakeStructType2 and then rename MakeStructType2 to MakeStructType and remove the other two.

  • The question is what API we want?
  • Can the slice and field type names be made clearer?
  • I was thinking we would do the sorting of the fields
arv commented

I talked to @ehalpern about ContainCommonSuperType. To us it seems like the semantics will not change. I'll add a few more tests though.