bufbuild/protocompile

Proposal: explore using google.golang.org/protobuf/reflect/protoreflect descriptors instead of custom descriptors

lfolger opened this issue · 5 comments

In #258 I made a proof of concept for using google.golang.org/protobuf/reflect/protoreflect descriptors instead of the the descriptors defined in linker/descriptors.go.
google.golang.org/protobuf/reflect/protodesc offers an API to turn file descriptor protos into protobuf/protoreflect descriptors. To be able to use this API on the descriptors that are generated by the parser the well-known options need to be resolved.

This means the current execution needs to change from:

  1. parsing
  2. linking (done by code in linker/)
  3. resolving options

to

  1. parsing
  2. resolving well-known options
  3. linking (done by reflect/protodesc)
  4. resolving not well-known options

In my experiments this was all that was needed but I don't know how good the test coverage of my experiments was.

Another issue might be that the reflect/protodesc validation is stricter than the linker/descriptors.go validation and thus the API would become more restrictive (backwards incompatible change).

I'm not saying this is the only (or even best) way forward but might be an option to consider to reduce the maintanence cost because protocompile would not need to maintain its own descriptor implementation.

@lfolger, thanks for filing this!

I had in fact considered that alternate proposed flow when originally implementing this. There's a little more context in this question that I filed in the protobuf-go repo at the time.

The main issue will be with supporting the user providing a custom/modified/etc version of google/protobuf/descriptor.proto. This mainly complicated by the new features option everywhere, for editions, which is a message. So we'd need a dynamic message for google.protobuf.FeatureSet, to interpret non-custom options that include features, which poses a boot-strapping issue: where do we get the descriptor for that when we are still in the middle of compiling the source for it.

There's also the performance issue of having to create the descriptors 2x: once before we interpret custom options, and then again after custom options are interpreted and source code info is generated. Or we could continue to have a custom implementation of the interfaces in this repo, but have them be much leaner, where they only override the Options() methods (to get the most recent options message, with custom options interpreted) and the SourceLocations() method on FileDescriptor.

More exploration is needed to figure out how to remove these custom implementations w/out loss of functionality and (maybe more importantly) without a regression in performance in terms of both throughput and memory usage.

Another issue might be that the reflect/protodesc validation is stricter than the linker/descriptors.go validation and thus the API would become more restrictive (backwards incompatible change).

This package intends to perform all of the same validation that protoc does. I do happen to know of at least one case where the reflect/protodesc package is a little more strict than protoc, and it's with messages using the message set wire format. But, frankly, no one should be using that, so it's certainly not a problem in practice. In fact, I actually made protocompile enforce the same rules as reflectprotodesc, so protocompile is also a little bit more strict than protoc when it comes to that. (If you're curious, the extra check that protocompile and reflect/protodesc perform is to require that a message with message set wire format has at least one extension range; protoc doesn't care and will allow a completely empty message to have message set wire format.)

The main issue will be with supporting the user providing a custom/modified/etc version of google/protobuf/descriptor.proto.

Does this mean you are allowing users to redefine the descriptor proto includings feature defaults?
This seems to be a problem for the protobuf implementation as many of the features and their defaults are baked into the binaries and are not resolved at compile/runtime.

In general it seems problematic if the .proto version of the descriptor.proto is not in sync with the generated .pb.go file that is baked into the binary (especially if the .proto file is newer) but I might be missing something here.

Does this mean you are allowing users to redefine the descriptor proto includings feature defaults?

Yes. protoc supports this as well. In particular, it is useful for pinning a prior or future version of descriptor.proto -- by future I mean a version of descriptor.proto from a newer version of protoc than what is "baked" into the compiler.

This seems to be a problem for the protobuf implementation as many of the features and their defaults are baked into the binaries and are not resolved at compile/runtime.

For the generated code, that is always the case. But this is a compiler. It does not necessarily deal only with statically generated code.

In general it seems problematic if the .proto version of the descriptor.proto is not in sync with the generated .pb.go file

FWIW, that happens all the time in practice as protobuf-go and protoc are not atomically released together. But also, this feature of the compiler can even be necessary to achieve the version matching you're talking about: if the compiler binary contains a different embedded version of descriptor.proto than the runtime library and generated code will expect, we have be able to provide the right version.

Thanks for clarifying. I think I misunderstood what you said initially.

It seems this is limited to released version of the descriptor.proto not arbitrary modifications to the descriptor proto. And for the most part it also means (assuming you are using an up-to-date version) that go protobuf understands all of these version because as it regularly updates.

This is superseded by #321.