mransan/ocaml-protoc

Why oneof is not wrapped in option?

Lupus opened this issue ยท 9 comments

According to [1]:

Oneof fields are encoded the same as if the fields were not in a oneof. The rules that apply to oneofs are independent of how they are represented on the wire.

Link from quote above leads to another page [2], that tells us that:

Oneof fields are like optional fields except all the fields in a oneof share memory, and at most one field can be set at the same time. Setting any member of the oneof automatically clears all the other members. You can check which value in a oneof is set (if any) using a special case() or WhichOneof() method, depending on your chosen language.

It seems that runtimes for other languages allow to check if oneof was actually set (i.e. there was at least one field from oneof on the wire).

ocaml-protoc seems to generate the variant for oneof, and then embed it directly into message type without any option. When parsing this, it defaults to first constructor with all-default content inside... Doesn't it make sense to wrap that in option, and if none of oneof fields actually arrive on the wire - pass that to application as None to conveniently handle this case instead of matching first constructor with default values and assuming it was "not sent"?

[1] https://protobuf.dev/programming-guides/encoding/#oneofs
[2] https://protobuf.dev/programming-guides/proto2#oneof

Sadly proto2 is still around a lot of places and is unlikely to disappear. All custom options are defined in proto2 as extensions to google.protobuf.MessageOptions, google.protobuf.OneofOptions or google.protobuf.FieldOptions, see https://github.com/bufbuild/protoc-gen-validate/blob/main/validate/validate.proto for example... Protoc plugins get their input as binary protobuf, encoded according to proto2 schema, so it's there for ethernity.

I'm trying an ugly hack of exporting Pb_option.set into JSON since the representations are quite similar, and decoding them via code generated out ot validate.proto. Dropped extend in favor of ordinary message so that I get actual decoders, it worked reasonably well, until I looked at the dump in more detail and noticed that oneofs are decoded as first variant, which made me curious about this, hence this discussion ๐Ÿ˜Š

I'm reading the encoding section of the docs, which seems to apply both for proto2 and proto3, and as I cited above, it says that one-of does not have any special rules for encoding on the wire. The proto3 docs link that you shared stresses (if any) for oneof value, meaning there can be no value.

Golang language guide on oneof officially encodes non-set one-ofs as nils and suggests to check that to figure out if it was set or not.

Java language guide on oneof adds a special value XXXXXX_NOT_SET(0) to enum to indicate that oneof was not set.

If wire encoding of one-of is exactly the same as individual fields inside, and it's up to the higher level to decide on the API, and both Golang and Java offer an official way to learn that oneof was not specified on the wire, why can't we do this in an idiomatic way with OCaml implementation by wrapping oneofs into options?

Both Go and Java have implicit nullability, which OCaml doesn't have. But I hear your point. Looking at opentelemetry's proto3 files, I see at least one oneof which can be explicitly omitted; and one where the first value can be used as default.

Another important quirk of ocaml-protoc is that, with proto3, you don't know which fields have been set. I'm not sure how it's done in, say, Go, but it is related. I can imagine us having a CLI flag that makes optionality more explicit (adding options to fields, or adding a 0th-case Foo_not_set to sum type foo). What do you think?

Both Go and Java have implicit nullability, which OCaml doesn't have.

Okay, let's look at protobuf implementation for a decent language wich lacks implicit nullability :) Will Rust work for this purpose? Quote from prost documentation on oneof-fields: "oneof fields are always wrapped in an Option."

Java adds additional enum variant to specify that oneof was not actually set on the wire, this has nothing to do with implicit nullability I believe. And for Go we have official Google documentation that explicitly states how implicit nullability is used to encode lacking oneof fields. And for Rust implementation we have a clear "always wrapped in an Option".

Another important quirk of ocaml-protoc is that, with proto3, you don't know which fields have been set. I'm not sure how it's done in, say, Go, but it is related.

As far as I know, messages are encoded as a list of key value pairs on the wire. Oneof fields do not change wire encoding, so members of oneof are encoded as ordinary fields within enclosing message. The only difference is that decoder needs to apply "last write wins" semantic to figure out which oneof member is actually set - they share the same memory in target language representation (e.g. encoded as a union in C), so we have to know exactly which of oneof members was actually set. Defaults are typically not sent on the wire with protobuf, right, but proto 3 added optional keyword so that it's possible to distinguish between "we received a default value that was omitted on the wire" and "other side didn't intend to set this field at all". If we apply "defaults are always omitted on the wire so assume they were sent" to oneof which was not set, we should interpret the 0-byte payload as a sequence of all default-but-not-sent oneof members, how should we apply "last write wins" here?

In generated Golang protobuf code for oneofs, there is a clear way to specify which oneof member is set and what is the value, even for primitive types like integers, as protobuf generator wraps oneof members with structs that implement interfaces so that implicit nullability can be leveraged to mimic algebraic data type semantic - know which case is set and have only one such case be set at the same time.

I can imagine us having a CLI flag that makes optionality more explicit (adding options to fields, or adding a 0th-case Foo_not_set to sum type foo). What do you think?

You mean making option-wrapping behavior for oneof fields an opt-in, controlled by command line flag? That could be a start, but it seems that current behavior is not aligned with other implementations. We could do a test probably, encoding some oneof with no fields with somewhat official implementation like Golang, decoding it back in Golang and with code generated by ocaml-protoc. If semantics will differ, as in: Golang would decode as "oneof not set" and ocaml-protoc would decode as "first constructor with all-defaults", we can probably conclude that ocaml-protoc has incompatibility with official tooling.

I think you're right overall, an option would be the more correct behavior. It does make the types more annoying to manipulate (since everything becomes an option, except for repeated fields), but if it's the price to pay for being compliant (and sending less data over the wire, too) then so be it.

You mean making option-wrapping behavior for oneof fields an opt-in, controlled by command line flag? That could be a start, but it seems that current behavior is not aligned with other implementations.

Yes, the main thing with this is that this change would prompt another major release. Why not, I suppose, but it'd be nice to have at least one transition release where this is opt-in instead of opt-out (or no opt-out at all).

This is so absurd though, they removed required and now it's back in the form of an option ๐Ÿ˜… . Oh well, why not, if we assume that validation constraints are always checked before returning a decoded message.

I've been thinking that maybe the mutable messages should be public (and always with option in fields, to reflect what does on the wire). The non-mutable ones can skip options if validation permits it. My other use case for exposing mutable messages is to reduce allocations when writing messages (if the mutable message is reused).

In a way it's the same idea, but where the current type of non-validated messages becomes validated (with additional constraints including non nullability); and the current mutable messages become public and are not validated.