protocolbuffers/protobuf

Add Support for Cyclic Objects

mdavisJr opened this issue · 6 comments

I'm getting the following error "Process is terminating due to StackOverflowException." when trying to process the following schema(with a cycle) through gRPC.

syntax = "proto3";

package client;

service Greeter {
  rpc FindClient (ClientRequest) returns (ClientReply) {}
}

message ClientRequest {
  int64 id = 1;
}

message ClientReply {
  int64 id = 1;
  string name = 2;
  ClientReply parent = 3;
  repeated ClientReply children = 4;
}

I feel that my example above is a good real life example and should be supported by Protobuf/gRPC. In addition, I feel there are many applications out there that have a similar structure. Moreover, it's hard to avoid cyclic objects if you use EntityFramework(C#) and have a bidirectional relationship, for instance, the following code below would return an cyclic object. More details on this can be found at dotnet/efcore#11564.

db.Clients.Where(x => x.Id = 3).Include("Children").Single();

I understand that when returning a response to the browser you should use a DTO, not expose your domain model, and make the response as light as possible, but in the context of gRPC I feel that you should be able to pass bigger object graphs and cyclic objects between your internal microservices.

I'm new to Protobuf and don't quite understand how Protobuf stores data in binary form so sorry if this is a bad reference, but there are some JSON frameworks that have added support for cyclic objects:

https://www.newtonsoft.com/json/help/html/PreserveReferencesHandlingObject.htm
https://dojotoolkit.org/reference-guide/1.10/dojox/json/ref.html
https://www.npmjs.com/package/cycle
https://github.com/WebReflection/circular-json/#circularjson
https://www.npmjs.com/package/jspon
I also think xml added support for handling objects by reference with ID/IDREF attributes.

I know handling objects(messages) by reference could add some complexity but it could also simplify in some areas. It could make the size of the data smaller by never serializing the same object more than once. It could also speed up the serializing because instead of serializing the same object that has 80 properties in it again, you would just detect that you have already processed the object before and create a reference pointer. Like I said before, I'm new to Protobuf so don't know if it is possible to handle objects(messages) by reference or not, but feel this would be a nice feature to add to Protobuf especially as gRPC grows.

I did my testing in C#, but would like this change made across the board.

I have a proof of concept working and wanted to get your thoughts. If a user has handle-messages-by-reference option turned on, embedded messages would have 2 forms--pointer form and message form. The embedded message form would be decided at runtime when traversing the object graph. While traversing the object graph, objects processed for the first time would be written out in message form. On the other hand, objects that have already been processed before would be written out in pointer form.

Pointer form would look like:

image

Message form would look like:

image

This would be a very intrusive change: it would require changing the wire format and all implementations. We have never made a change of this scale, and it would take an enormous benefit to justify the cost. It is very unlikely we would accept a change of this kind, sorry.

I recommend adding ids to your protobuf schema to represent object references if you want to have cyclic object graphs.

Thanks for the response.

Could you explain more on how the following idea would work "I recommend adding ids to your protobuf schema to represent object references if you want to have cyclic object graphs."

Basically I am echoing what you said about the difference between "message form" and "pointer form." If you give each object a unique id (ie. pointer) you can encode these in normal protobuf messages as integers. Then your object graphs can support cycles even if protobuf messages themselves do not.

For example you might use something like this:

oneof object {
  FooMessage message = 1;
  uint32 id = 2;
}

For anyone that found this after the fact like me. Don't forget that C# classes that are created by protocol buffers are sealed partial class which means that it cannot be inherited, but you can add to the class with more partial classes. This would help solve any DbContext mapping issues or wanting a virtual property.

This was one of the best features of Macromedia's AMF protocol released in 2002 with Flash Player 6. Once an object of any type was serialized, any future instances seen in the graph were encoded by reference instead of by value. This was brilliant, and supported very complex object graphs.