error.description standard tag
Closed this issue · 23 comments
Hello,
there is boolean error
tag to denote that span is in an error state. Would it make sense to add string error.description
standard tag to include more details about the error?
Tag error.description
seems not necessary to me. If the span have error, as you mentioned. maybe there is not only one error.
So, I prefer to consider error.description
as a list, and they should be logged, rather than tagged.
But, current log()
may be not enough. If just log the detail messages, read and use them in backend maybe too complex.
@bensigelman any thoughts?
Saw something related in #6, but not sure, and I miss the conclusion. 😭 @adriancole , you maybe know this.
I agree with @wu-sheng, it's better to log errors than to save them as tags, because an exception is an event in time. The standard error
tag is meant to flag the whole operation represented by the span as failed.
Having said that, we currently do not have standard labels for log fields, aside from semi-official event
. But if you start using error
as a log field in the instrumentation, I'd say there's a high probability that it will become a standard field. We are already using it in our instrumentations.
@yurishkuro, I saw #6 closed.
It's agreed to add error
to the spec?
The conclusion is not so clear to me.
@adriancole , if exist more than one errors in a span, tag may be not enough?
Thanks for the clarification 👍 . It would also make sense to me if error
tag was a string. This makes easy to add a high level information why it happened (e.g. an exception message) and use logs to include more details. (as Adrian mentioned it can be left empty)
In the spec level, error
stays in a tag key (Value is boolean), and add an log method should be enough.
As tracer implementation level, like zipkin, you can do log and tag together in the error method, as Cole said. This depends on tracers.
we currently use:
error.msg = "ZeroDivisionError: integer division or modulo by zero"
error.type = ZeroDivisionError
error.stack = """Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero"""
i think the type is probably redundant. the stack is an arbitrary string field which can be interpreted based on the language that the span was generated in.
Span error-ness should remain a bool as others have said. This is useful and unambiguous in analytical tools.
I am very much in favor of standardizing log key+values for errors that take place during a Span (also as others have suggested above)... my strawman, then:
- Span:
error
, a bool
- Log:
event
:"error"
message
: a stringstack
: a string
I can see some value in eventually distinguishing between hard and soft (i.e., recoverable) errors, but I'd rather wait until there's a pressing need.
Thoughts? If this sounds good I can send out a PR against the yaml file in this repo.
Sounds good, but the difference between message
and event
must be described in a good way. E.g. Are we supposed to use event
for every log-call to specify the kind (error
, orderPlaced
, ...)? Is it up to the user?, ...
@cwe1ss event
is not supposed to be required on every log call per se, but the idea is for it to be a low-cardinality indicator of type of, well, event.
@bensigelman , errorness stays in boolean.
And log like this, means log a map including there 3 keys? If so, I think that is great.
- Log:
- event: "error"
- message: a string
- stack: a string
@bensigelman yep, just saying it should go into the docs!
I still have some questions about the ussage/semantics of error
tag:
Should it represent application errors (exceptions) or faults in terms of a business path (e.g. when _ Account Not Found_ is returned which takes business process down to alternate paths).
@pavolloffay , why use OT to alternate business process paths?
When Account Not Found
happened, your origin application code should try/catch the exception, like OT is not existed, and process it. This should not the OT API's responsibility.
@pavolloffay OpenTracing spec does not prescribe that. Conceptually, if you want to distinguish business errors from infra errors, you can separate them in different spans and still use the same error tag. I.e. have a parent span for the business operation, and individual spans for infra operations (like RPCs).
@pavolloffay , seems you want to log the process terms? This is not error
's duty.
anything blocking us from accepting @bensigelman thoughts here? i'm a +1.
@clutchski LGTM! All that needs to happen is a PR: #35
As #35 be merged, this issue could be closed.
@pavolloffay , is that well enough?
Yes, I'm closing it.