Non-SelfDescribingErrors mode: add new wire types for "ERROR" and "PATH"
Closed this issue · 2 comments
TL;DR: I'm proposing that Argo add a PATH
wire type and an ERROR
wire type. See examples (C) and (D) below.
Why?
The current reference implementation of Argo only supports SelfDescribingErrors
and OutOfBandFieldErrors
.
The purpose of this issue is to explore and discuss what non-self-describing errors might look like in their encoded form and whether having additional wire types for PATH
and ERROR
would be beneficial.
All of the examples below use the following JSON value as an example GraphQL response:
{
"data": {
"a": "i",
"b": [{"x": {"y": null}}]
},
"errors": [
{
"message": "b-x-y-error",
"location": [{"line": 5, "column": 12}],
"path": ["b", 0, "x", "y"],
"extensions": {"z": 1}
}
]
}
The original GraphQL query isn't important, but it could be imagined to look like this:
query {
a
b {
x {
y
}
}
}
(A) Error type as DESC
Reference implementation wire type (using DESC
for the Error type):
{
data: {
a?: STRING
b: {
x: {
y: STRING?
}
}[]?
}
errors?: DESC[]?
}
Encoding with InlineEverything
, OutOfBandFieldErrors
, and SelfDescribingErrors
enabled:
# 97-bytes Base-16
0d00026900020100000204080e6d6573736167650816622d782d792d6572726f72106c6f636174696f6e06020404086c696e650c0a0c636f6c756d6e0c18087061746806080802620c0008027808027914657874656e73696f6e730402027a0c02
# 97-bytes Escaped
\r\0\x02i\0\x02\x01\0\0\x02\x04\x08\x0emessage\x08\x16b-x-y-error\x10location\x06\x02\x04\x04\x08line\x0c\n\x0ccolumn\x0c\x18\x08path\x06\x08\x08\x02b\x0c\0\x08\x02x\x08\x02y\x14extensions\x04\x02\x02z\x0c\x02
(B) Error type as RECORD
with STRING[]
path
Generally follows section 5.8.1 from the Argo 1.0.0 spec:
{
data: {
a?: STRING
b: {
x: {
y: STRING?
}
}[]?
}
errors?: {
message: STRING
location?: {
line: VARINT
column: VARINT
}[]
path?: STRING[]
extensions?: DESC
}[]?
}
Encoding with InlineEverything
and OutOfBandFieldErrors
enabled:
# 43-bytes Base-16
0500026900020100000216622d782d792d6572726f7200020a1800080262023002780279000402027a0c02
# 43-bytes Escaped
\x05\0\x02i\0\x02\x01\0\0\x02\x16b-x-y-error\0\x02\n\x18\0\x08\x02b\x020\x02x\x02y\0\x04\x02\x02z\x0c\x02
(C) Error type as RECORD
with PATH
path
New proposed PATH
type instead of STRING[]
:
{
data: {
a?: STRING
b: {
x: {
y: STRING?
}
}[]?
}
errors?: {
message: STRING
location?: {
line: VARINT
column: VARINT
}[]
path?: PATH
extensions?: DESC
}[]?
}
Encoding with InlineEverything
and OutOfBandFieldErrors
enabled:
# 43-bytes Base-16
0500026900020100000216622d782d792d6572726f7200020a1800080262000002780279000402027a0c02
# 43-bytes Escaped
\x05\0\x02i\0\x02\x01\0\0\x02\x16b-x-y-error\0\x02\n\x18\0\x08\x02b\0\0\x02x\x02y\0\x04\x02\x02z\x0c\x02
The main difference here being that path segments may only be one of the following:
enum PathSegment {
FieldName(NonEmptyString),
ListIndex(UnsignedInteger),
}
It's expected that most path segments will be a non-empty String for a field name or alias, so by enforcing that the length of a FieldName
segment is non-zero, we can use the NON_NULL
label (which is 0
) as a reserved label to indicate that a VARINT
for a ListIndex
is expected next instead of a String.
So, using the example from (B), the zero 0
path segment encoding from ["b", 0, "x", "y"]
would normally encode as a String:
{Varint(1), "0"}
With the PATH
type, it would be encoded as follows:
{Varint(0), Varint(0)}
Where the first Varint()
refers to the label NON_NULL
.
For single field errors, the difference between PATH
and STRING[]
is negligible and should normally result in same-size or slightly smaller encodings compared to the STRING[]
variant (cases where a ListIndex
with an integer value larger than 9 would result in a smaller encoding).
More benefits from PATH
will be described in more detail in a future issue related to @defer
and @stream
support, which heavily relies on the PATH
type.
(D) Error type as ERROR
New proposed ERROR
type, which is identical to (C) above, but can be re-used wherever the Error type may need to be used:
{
data: {
a?: STRING
b: {
x: {
y: STRING?
}
}[]?
}
errors?: ERROR[]?
}
Encoding with InlineEverything
and OutOfBandFieldErrors
enabled:
# 43-bytes Base-16
0500026900020100000216622d782d792d6572726f7200020a1800080262000002780279000402027a0c02
# 43-bytes Escaped
\x05\0\x02i\0\x02\x01\0\0\x02\x16b-x-y-error\0\x02\n\x18\0\x08\x02b\0\0\x02x\x02y\0\x04\x02\x02z\x0c\x02
TODO: (E) Field errors encoding (not yet implemented)
For non-OutOfBandFieldErrors
mode, the ERROR
label could be written to an otherwise null
field and immediately be followed by a list(?) of ERROR[]
.
I don't think the GraphQL specification limits how many errors might be returned for a particular path
, so I'm assuming Argo would need to support a list of field errors.
It might be possible to drop the path
field when encoded as a field error. This is assuming that the path
could be reconstructed by walking the decoded field itself in combination with the given wire type. For field errors involving arrays, this might work, but for incremental updates it's unclear whether path
may be omitted or not.
Thanks for opening this issue Andrew!
Here are a few initial thoughts on A-E, as well as a new thought F:
(A) Error type as DESC
The difference in size seems ok to me for traditional request/response workflows, but I do understand that DESC is a bit annoying and unsatisfying, and that we're building up to stream/defer use cases, so I'll leave this one alone.
(B) Error type as RECORD with STRING[] path
The upshot here is relative simplicity, relatively easy debuggability, and easy pairing with BLOCK-deduping for path elements (which seems likely to help quite a bit in stream/defer).
The downside is in practice, the string paths could be large-ish---certainly they might dominate the payload size, but in these cases the total payloads are small (say, hundreds of bytes).
(C) Error type as RECORD with PATH path
Not a bad idea. It introduces an Argo-native concept to the strange path encodings demanded by the GraphQL spec, which might be helpful.
The use of 0 is clever and seems unlikely to be a real problem, but it does muddy up the meaning of Labels just a bit, and it introduces a new concept to Argo (basically, tagged unions).
(D) Error type as ERROR
There might be some upshot in having names for common types, but I think it's preferable to achieve this without introducing new kinds of Wire types--this way, code that handles wire Types doesn't need to know about ERROR specifically.
Contrasting with PATH, the case seems less strong for introducing a first-class concept/type.
(E) Field errors encoding (not yet implemented)
Yeah, this is described in 5.8.3 Field errors, and you need a partial path to know where the errors bubbled up from (if originating in a nested non-nullable field). However, I expect the space saving is unimportant for most traditional request/response use cases, and it seems somewhat unlikely to be possible in stream/defer. I think that means the approaches other than E are more worth looking into.
You make a good point about multiple errors, I should look into whether it ought to be an inline array of error values.
(F) Error type as RECORD
with VARINT[]
path
I was thinking about how this might be made maximally compact without making things too complex, and I had this idea:
all the Wire types have a deterministic order already, so maybe we could transform a GraphQL spec-compliant path to an array of VARINT by replacing each field name with the index of the corresponding field in its parent RECORD. (Array indices can be left alone.) This could be reconstructed on the other side given what we know from Wire types. These integers would typically be very small (growing only with the number of fields in each RECORD), so we could expect most Path values to use roughly 1 byte per level of nesting.
It's an extra step, but if PATH size is important for stream/defer, it might be helpful. We may still want an explicit PATH type.
You can see a prototype in the reference implementation on the integer-paths
branch: https://github.com/msolomon/argo/compare/integer-paths
Let me know what you think!
I like (F), I think it has much better potential for savings compared to (B) and (C). I think (E) is most useful for codegen purposes, but otherwise I agree that having it as its own wire type doesn't provide anything new.
I'll put together another issue soon-ish with more details about defer/stream, which is where I'm hoping the optimized PATH type (F) will be more useful.