Non-SelfDescribingErrors mode: add new wire types for "ERROR" and "PATH"

Question

Non-SelfDescribingErrors mode: add new wire types for "ERROR" and "PATH"

Closed this issue 9 months ago · 2 comments

TL;DR: I'm proposing that Argo add a PATH wire type and an ERROR wire type. See examples (C) and (D) below.

Why?

The current reference implementation of Argo only supports SelfDescribingErrors and OutOfBandFieldErrors.

The purpose of this issue is to explore and discuss what non-self-describing errors might look like in their encoded form and whether having additional wire types for PATH and ERROR would be beneficial.

All of the examples below use the following JSON value as an example GraphQL response:

{
    "data": {
        "a": "i",
        "b": [{"x": {"y": null}}]
    },
    "errors": [
        {
            "message": "b-x-y-error",
            "location": [{"line": 5, "column": 12}],
            "path": ["b", 0, "x", "y"],
            "extensions": {"z": 1}
        }
    ]
}

The original GraphQL query isn't important, but it could be imagined to look like this:

query {
    a
    b {
        x {
            y
        }
    }
}

(A) Error type as `DESC`

Reference implementation wire type (using DESC for the Error type):

{
  data: {
    a?: STRING
    b: {
      x: {
        y: STRING?
      }
    }[]?
  }
  errors?: DESC[]?
}

Encoding with InlineEverything, OutOfBandFieldErrors, and SelfDescribingErrors enabled:

# 97-bytes Base-16
0d00026900020100000204080e6d6573736167650816622d782d792d6572726f72106c6f636174696f6e06020404086c696e650c0a0c636f6c756d6e0c18087061746806080802620c0008027808027914657874656e73696f6e730402027a0c02

# 97-bytes Escaped
\r\0\x02i\0\x02\x01\0\0\x02\x04\x08\x0emessage\x08\x16b-x-y-error\x10location\x06\x02\x04\x04\x08line\x0c\n\x0ccolumn\x0c\x18\x08path\x06\x08\x08\x02b\x0c\0\x08\x02x\x08\x02y\x14extensions\x04\x02\x02z\x0c\x02

(B) Error type as `RECORD` with `STRING[]` path

Generally follows section 5.8.1 from the Argo 1.0.0 spec:

{
  data: {
    a?: STRING
    b: {
      x: {
        y: STRING?
      }
    }[]?
  }
  errors?: {
    message: STRING
    location?: {
      line: VARINT
      column: VARINT
    }[]
    path?: STRING[]
    extensions?: DESC
  }[]?
}

Encoding with InlineEverything and OutOfBandFieldErrors enabled:

# 43-bytes Base-16
0500026900020100000216622d782d792d6572726f7200020a1800080262023002780279000402027a0c02

# 43-bytes Escaped
\x05\0\x02i\0\x02\x01\0\0\x02\x16b-x-y-error\0\x02\n\x18\0\x08\x02b\x020\x02x\x02y\0\x04\x02\x02z\x0c\x02

(C) Error type as `RECORD` with `PATH` path

New proposed PATH type instead of STRING[]:

{
  data: {
    a?: STRING
    b: {
      x: {
        y: STRING?
      }
    }[]?
  }
  errors?: {
    message: STRING
    location?: {
      line: VARINT
      column: VARINT
    }[]
    path?: PATH
    extensions?: DESC
  }[]?
}

Encoding with InlineEverything and OutOfBandFieldErrors enabled:

# 43-bytes Base-16
0500026900020100000216622d782d792d6572726f7200020a1800080262000002780279000402027a0c02

# 43-bytes Escaped
\x05\0\x02i\0\x02\x01\0\0\x02\x16b-x-y-error\0\x02\n\x18\0\x08\x02b\0\0\x02x\x02y\0\x04\x02\x02z\x0c\x02

The main difference here being that path segments may only be one of the following:

enum PathSegment {
    FieldName(NonEmptyString),
    ListIndex(UnsignedInteger),
}

It's expected that most path segments will be a non-empty String for a field name or alias, so by enforcing that the length of a FieldName segment is non-zero, we can use the NON_NULL label (which is 0) as a reserved label to indicate that a VARINT for a ListIndex is expected next instead of a String.

So, using the example from (B), the zero 0 path segment encoding from ["b", 0, "x", "y"] would normally encode as a String:

{Varint(1), "0"}

With the PATH type, it would be encoded as follows:

{Varint(0), Varint(0)}

Where the first Varint() refers to the label NON_NULL.

For single field errors, the difference between PATH and STRING[] is negligible and should normally result in same-size or slightly smaller encodings compared to the STRING[] variant (cases where a ListIndex with an integer value larger than 9 would result in a smaller encoding).

More benefits from PATH will be described in more detail in a future issue related to @defer and @stream support, which heavily relies on the PATH type.

(D) Error type as `ERROR`

New proposed ERROR type, which is identical to (C) above, but can be re-used wherever the Error type may need to be used:

{
  data: {
    a?: STRING
    b: {
      x: {
        y: STRING?
      }
    }[]?
  }
  errors?: ERROR[]?
}

Encoding with InlineEverything and OutOfBandFieldErrors enabled:

# 43-bytes Base-16
0500026900020100000216622d782d792d6572726f7200020a1800080262000002780279000402027a0c02

# 43-bytes Escaped
\x05\0\x02i\0\x02\x01\0\0\x02\x16b-x-y-error\0\x02\n\x18\0\x08\x02b\0\0\x02x\x02y\0\x04\x02\x02z\x0c\x02

TODO: (E) Field errors encoding (not yet implemented)

For non-OutOfBandFieldErrors mode, the ERROR label could be written to an otherwise null field and immediately be followed by a list(?) of ERROR[].

I don't think the GraphQL specification limits how many errors might be returned for a particular path, so I'm assuming Argo would need to support a list of field errors.

It might be possible to drop the path field when encoded as a field error. This is assuming that the path could be reconstructed by walking the decoded field itself in combination with the given wire type. For field errors involving arrays, this might work, but for incremental updates it's unclear whether path may be omitted or not.

Answer 1 · 2023-11-10T22:26:55.000Z

Thanks for opening this issue Andrew!

Here are a few initial thoughts on A-E, as well as a new thought F:

(A) Error type as DESC

The difference in size seems ok to me for traditional request/response workflows, but I do understand that DESC is a bit annoying and unsatisfying, and that we're building up to stream/defer use cases, so I'll leave this one alone.

(B) Error type as RECORD with STRING[] path

The upshot here is relative simplicity, relatively easy debuggability, and easy pairing with BLOCK-deduping for path elements (which seems likely to help quite a bit in stream/defer).

The downside is in practice, the string paths could be large-ish---certainly they might dominate the payload size, but in these cases the total payloads are small (say, hundreds of bytes).

(C) Error type as RECORD with PATH path

Not a bad idea. It introduces an Argo-native concept to the strange path encodings demanded by the GraphQL spec, which might be helpful.

The use of 0 is clever and seems unlikely to be a real problem, but it does muddy up the meaning of Labels just a bit, and it introduces a new concept to Argo (basically, tagged unions).

(D) Error type as ERROR

There might be some upshot in having names for common types, but I think it's preferable to achieve this without introducing new kinds of Wire types--this way, code that handles wire Types doesn't need to know about ERROR specifically.

Contrasting with PATH, the case seems less strong for introducing a first-class concept/type.

(E) Field errors encoding (not yet implemented)

Yeah, this is described in 5.8.3 Field errors, and you need a partial path to know where the errors bubbled up from (if originating in a nested non-nullable field). However, I expect the space saving is unimportant for most traditional request/response use cases, and it seems somewhat unlikely to be possible in stream/defer. I think that means the approaches other than E are more worth looking into.

You make a good point about multiple errors, I should look into whether it ought to be an inline array of error values.

(F) Error type as `RECORD` with `VARINT[]` path

I was thinking about how this might be made maximally compact without making things too complex, and I had this idea:
all the Wire types have a deterministic order already, so maybe we could transform a GraphQL spec-compliant path to an array of VARINT by replacing each field name with the index of the corresponding field in its parent RECORD. (Array indices can be left alone.) This could be reconstructed on the other side given what we know from Wire types. These integers would typically be very small (growing only with the number of fields in each RECORD), so we could expect most Path values to use roughly 1 byte per level of nesting.

It's an extra step, but if PATH size is important for stream/defer, it might be helpful. We may still want an explicit PATH type.

You can see a prototype in the reference implementation on the integer-paths branch: https://github.com/msolomon/argo/compare/integer-paths

Let me know what you think!

Answer 2 · 2023-11-14T16:26:44.000Z

I like (F), I think it has much better potential for savings compared to (B) and (C). I think (E) is most useful for codegen purposes, but otherwise I agree that having it as its own wire type doesn't provide anything new.

I'll put together another issue soon-ish with more details about defer/stream, which is where I'm hoping the optimized PATH type (F) will be more useful.

(A) Error type as DESC

(B) Error type as RECORD with STRING[] path

(C) Error type as RECORD with PATH path

(D) Error type as ERROR

TODO: (E) Field errors encoding (not yet implemented)

(A) Error type as DESC

(B) Error type as RECORD with STRING[] path

(C) Error type as RECORD with PATH path

(D) Error type as ERROR

(E) Field errors encoding (not yet implemented)

(F) Error type as RECORD with VARINT[] path

(A) Error type as `DESC`

(B) Error type as `RECORD` with `STRING[]` path

(C) Error type as `RECORD` with `PATH` path

(D) Error type as `ERROR`

(F) Error type as `RECORD` with `VARINT[]` path