More robust transcoding of `google.protobuf.Any` error details
jhump opened this issue · 0 comments
RPC errors can include arbitrary additional details in the form of google.protobuf.Any
messages. These messages are generally transmitted in the Protobuf binary format, and the in-memory format of a de-serialized google.protobuf.Any
message also contains bytes in the Protobuf binary format.
When translating errors into responses for REST clients, the error is transcoded into JSON. This means that any error details will also be transcoded to JSON. Furthermore, when receiving an error body from a REST server, we first try to de-serialize the body as a JSON-formatted RPC error. This requires transcoding the other direction, from JSON to Protobuf binary format.
The issue is that transcoding a google.protobuf.Any
message from binary to JSON or vice versa requires having the schema for that message type. So there is a class of failures where an RPC error includes an unrecognized message type as an error detail. When that happens the transcoding will fail because there is no schema available to inform the transcoding step for that message type.
Today when this happens:
- If we cannot transcode to JSON, for a REST client, the client will instead receive this less-than-helpful error:
{"code":13,"message":"failed to marshal end error"}
. - If we cannot transcode from JSON, from a REST server, we synthesize an RPC error with a code based on the HTTP status code and a
google.api.HttpBody
error detail message that contains the error body that could not be transcoded and its content-type.
Similar software has similar issues:
- In the gRPC-JSON transcoder filter for Envoy, if this happens, no error body will be sent and instead the error details will remain only in
grpc-status
,grpc-message
, and/orgrpc-status-details-bin
response headers or trailers. (Likely even less useful to a REST client than Vanguard's behavior described above.) - In gRPC-Gateway, when this happens, the error body is hardcoded to
{"error": "failed to marshal error message"}
(not too far from Vanguard's default behavior). However, gRPC-Gateway allows the application to configure a custom error handler which could potentially handle such marshalling errors in a much more robust fashion.
Ideally, when the only reason we'd fail to transcode an RPC error to/from JSON is because of missing schemata for error details therein, we would produce a slightly different error format -- one that preserves as much as possible of the original error data. Here are some concrete tactics we could take:
- Instead of attempting to marshal or unmarshal the RPC status to/from JSON, we should process it in a custom fashion where error details can be transcoded one-at-a-time. This allows the processing to detect if/when a single error detail cannot be processed and to use a fallback strategy for that one message. This preserves the status code and message and possibly even a subset of error details, even if there is an unrecognized error detail message.
- When there is an unknown error detail and we are transcoding to JSON, serialize the message to a JSON object with
@type
and@value
keys, where the@value
key is a base64-encoded string that corresponds to the binary bytes that cannot otherwise be transcoded.- Ideally support for this special form would be symmetric: when transcoding an error detail from JSON, if we see that is has just
@type
and@value
properties and we can't otherwise unmarshal theAny
message, then try to base64-code the@value
property and construct theAny
error detail from that.
- Ideally support for this special form would be symmetric: when transcoding an error detail from JSON, if we see that is has just
- When there is an unknown error details and we are transcoding from JSON, de-serialize the message as a
google.protobuf.Struct
message and then append that to the RPC error details.
Also, it's worth noting that gRPC-Gateway synthesizes an additional error
string field in the JSON form of the RPC error, because that is a very common part of JSON error representations expected by REST clients. It could be useful for Vanguard to do the same -- for instance, we could put the string form of code
there, which satisfies the expected shape (a string field named error
) and also makes the error more human-readable without the human having to memorize the 16 RPC codes.