stripe/skycfg

Non-determinism with oneofs

shubhitms opened this issue · 0 comments

Here's a reproduction of the issue I'm hitting:

package main

import (
	"context"
	"fmt"

	"github.com/stripe/skycfg"
)

func main() {
	ctx := context.Background()
	config, err := skycfg.Load(ctx, "hello.sky")
	if err != nil { panic(err) }
	messages, err := config.Main(ctx)
	if err != nil { panic(err) }
	for _, msg := range messages {
		fmt.Printf("%s\n", msg.String())
	}
}
# hello.sky
pb = proto.package("google.protobuf")

def main(ctx):
  return [pb.Value(
    bool_value = True,
    string_value = "foo"
  )]

Note, Value contains a oneof - https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto#L62
Repeatedly invocating the code now returns non-deterministic results:

$ for x in {1..5}; do ./test-skycfg; done
string_value:"foo"
bool_value:true
bool_value:true
string_value:"foo"
bool_value:true

Looks like the non-determinism is in part due to that fact that the fields of the returned proto object are stored internally in a go map[string]starlark.Value which has non-deterministic ordering, and the oneof field that is processed last wins - https://github.com/stripe/skycfg/blob/trunk/go/protomodule/protomodule_message.go#L279

Looks like the intent was to mimic the behavior in dynamicpb - https://github.com/stripe/skycfg/pull/90/files#r692217496

I decided to compare this behavior with pure python (as it is also dynamically typed and closely resembles starlark), and although you can define something similar:

> import google.protobuf.struct_pb2 as pb
> print(pb.Value(string_value="foo", bool_value=True))
bool_value: true  # deterministic

The output is deterministic (I suspect because python 3.7 introduced dictionaries that maintained insertion order https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6)

Are the authors open to adding determinism here? This can result in hard to reason about configurations due to inadvertent user error. One naive solution would be to sort fields by name to ensure some type of deterministic ordering.