error.description standard tag

Question

error.description standard tag

Closed this issue 8 years ago · 23 comments

Hello,

there is boolean error tag to denote that span is in an error state. Would it make sense to add string error.description standard tag to include more details about the error?

Answer 1 · 2017-01-04T09:26:56.000Z

Tag error.description seems not necessary to me. If the span have error, as you mentioned. maybe there is not only one error.
So, I prefer to consider error.description as a list, and they should be logged, rather than tagged.

But, current log() may be not enough. If just log the detail messages, read and use them in backend maybe too complex.

@bensigelman any thoughts?
Saw something related in #6, but not sure, and I miss the conclusion. 😭 @adriancole , you maybe know this.

Answer 2 · 2017-01-04T17:18:01.000Z

I agree with @wu-sheng, it's better to log errors than to save them as tags, because an exception is an event in time. The standard error tag is meant to flag the whole operation represented by the span as failed.

Having said that, we currently do not have standard labels for log fields, aside from semi-official event. But if you start using error as a log field in the instrumentation, I'd say there's a high probability that it will become a standard field. We are already using it in our instrumentations.

Answer 3 · 2017-01-04T23:35:25.000Z

@yurishkuro, I saw #6 closed.
It's agreed to add error to the spec?
The conclusion is not so clear to me.

Answer 4 · 2017-01-05T01:39:19.000Z

in zipkin we set the error tag value as the description. It the tag exists at all, we know there's an error. If someone is trying to be super size efficient, they can always set empty string.

…

On Wed, Jan 4, 2017 at 4:56 PM, Pavol Loffay ***@***.***> wrote: Hello, there is boolean error tag to denote that span is in an error state. Would it make sense to add string error.description standard tag to include more details about the error? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#29>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD61xNfS63yqcEjp2e-FpE3HIMHSmVPks5rO16igaJpZM4LaaGs> .

Answer 5 · 2017-01-05T01:43:42.000Z

@adriancole , if exist more than one errors in a span, tag may be not enough?

Answer 6 · 2017-01-05T02:16:57.000Z

we use annotate (event in OT) for potentially resolvable timestamped errors, and a tag to indicate complete failure of the span https://github.com/openzipkin/zipkin-api/blob/master/thrift/zipkinCore.thrift#L218

…

On Thu, Jan 5, 2017 at 9:43 AM, 吴晟 Wu Sheng ***@***.***> wrote: @adriancole <https://github.com/adriancole> , if exist more than one errors in a span, tag may be not enough? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD61-qbux86CdWOhOgiRM3MuMoG9Oe5ks5rPErOgaJpZM4LaaGs> .

Answer 7 · 2017-01-05T14:07:36.000Z

Thanks for the clarification 👍 . It would also make sense to me if error tag was a string. This makes easy to add a high level information why it happened (e.g. an exception message) and use logs to include more details. (as Adrian mentioned it can be left empty)

Answer 8 · 2017-01-05T15:35:10.000Z

In the spec level, error stays in a tag key (Value is boolean), and add an log method should be enough.
As tracer implementation level, like zipkin, you can do log and tag together in the error method, as Cole said. This depends on tracers.

Answer 9 · 2017-01-06T16:31:01.000Z

we currently use:

error.msg = "ZeroDivisionError: integer division or modulo by zero"

error.type = ZeroDivisionError

error.stack = """Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero"""

i think the type is probably redundant. the stack is an arbitrary string field which can be interpreted based on the language that the span was generated in.

Answer 10 · 2017-01-06T17:58:18.000Z

Span error-ness should remain a bool as others have said. This is useful and unambiguous in analytical tools.

I am very much in favor of standardizing log key+values for errors that take place during a Span (also as others have suggested above)... my strawman, then:

Span:
- error, a bool
Log:
- event: "error"
- message: a string
- stack: a string

I can see some value in eventually distinguishing between hard and soft (i.e., recoverable) errors, but I'd rather wait until there's a pressing need.

Thoughts? If this sounds good I can send out a PR against the yaml file in this repo.

Answer 11 · 2017-01-06T18:19:45.000Z

+1 On 6 Jan 2017 17:58, "bhs" <notifications@github.com> wrote: Span error-ness should remain a bool as others have said. This is useful and unambiguous in analytical tools. I am very much in favor of standardizing log key+values for errors that take place during a Span (also as others have suggested above)... my strawman, then: - Span: - error, a bool - Log: - event: "error" - message: a string - stack: a string I can see some value in eventually distinguishing between hard and soft (i.e., recoverable) errors, but I'd rather wait until there's a pressing need. Thoughts? If this sounds good I can send out a PR against the yaml file in this repo. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#29 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAKC0qg3T4XZzMP7hj2xy71FZokATQSRks5rPoC6gaJpZM4LaaGs> .

Answer 12 · 2017-01-06T20:22:39.000Z

Sounds good, but the difference between message and event must be described in a good way. E.g. Are we supposed to use event for every log-call to specify the kind (error, orderPlaced, ...)? Is it up to the user?, ...

Answer 13 · 2017-01-06T22:40:54.000Z

@cwe1ss event is not supposed to be required on every log call per se, but the idea is for it to be a low-cardinality indicator of type of, well, event.

Answer 14 · 2017-01-06T23:54:21.000Z

@bensigelman , errorness stays in boolean.
And log like this, means log a map including there 3 keys? If so, I think that is great.

Log:
- event: "error"
- message: a string
- stack: a string

Answer 15 · 2017-01-07T02:18:48.000Z

@bensigelman yep, just saying it should go into the docs!

Answer 16 · 2017-01-10T12:03:09.000Z

I still have some questions about the ussage/semantics of error tag:

Should it represent application errors (exceptions) or faults in terms of a business path (e.g. when _ Account Not Found_ is returned which takes business process down to alternate paths).

Answer 17 · 2017-01-10T13:40:56.000Z

@pavolloffay , why use OT to alternate business process paths?

When Account Not Found happened, your origin application code should try/catch the exception, like OT is not existed, and process it. This should not the OT API's responsibility.

Answer 18 · 2017-01-10T17:03:07.000Z

@pavolloffay OpenTracing spec does not prescribe that. Conceptually, if you want to distinguish business errors from infra errors, you can separate them in different spans and still use the same error tag. I.e. have a parent span for the business operation, and individual spans for infra operations (like RPCs).

Answer 19 · 2017-01-10T23:37:38.000Z

@pavolloffay , seems you want to log the process terms? This is not error's duty.

Answer 20 · 2017-01-27T18:50:53.000Z

anything blocking us from accepting @bensigelman thoughts here? i'm a +1.

Answer 21 · 2017-01-27T20:01:41.000Z

@clutchski LGTM! All that needs to happen is a PR: #35

Answer 22 · 2017-02-15T07:06:22.000Z

As #35 be merged, this issue could be closed.

@pavolloffay , is that well enough?

Answer 23 · 2017-02-19T18:11:04.000Z

Yes, I'm closing it.