opentracing/specification

Introduce new tag to record client timeouts

yurishkuro opened this issue · 3 comments

Background

When analyzing traces for latency variance or the impact of SLOs of downstream on upstream services, it is useful to know if a client span was finished in error because of the timeout rather than a bad response from the server.

Proposal

Introduce a new tag timeout: bool, similar to error: bool, that the client span can be marked with.

Questions to address

We do not have clear recommendation on modeling timeouts and retries in the OT docs at the moment. Should there be some other tag that indicates that span B was a retry of span A because span A timed out?

Definitely good idea to identify timeouts. Just wondering whether better to have a more general tag to allow for classifying other error types? e.g. error.type or error.kind (reusing the log field).

Are you suggesting a reference to represent the retry relationship? Would depend whether generally the span associated with the request being retried is available - although that is more of an implementation detail.

+1 for timeout: bool tag.

If one error specific tag is added, eventually there will be requests for all error types such as DNS Lookup errors, NoResponse, SocketClosed etc... Supporting error specific tags may need a larger view/effort than just adding one specific one FWIW.