`downstreamXRayEnabled` / `traced` property behavior doesn't seem to align with X-Ray documentation

Question

`downstreamXRayEnabled` / `traced` property behavior doesn't seem to align with X-Ray documentation

jmm opened this issue 2 years ago · 2 comments

Hello, I'm confused about the behavior of the downstreamXRayEnabled parameter to captureHTTPsGlobal. The documentation and behavior seem aligned that when true, a top-level traced: true property is created on the subsegment.

However, the AWS X-Ray segment documents documentation only documents traced as a property of the http.request object of subsegments. Whereas the SDK seems to explicitly move it from there to the top-level of the subsegment.

Can you please clarify the difference between that documentation and the behavior here?

Also, in practice, I can't observe a meaningful difference between downstreamXRayEnabled: false (or omitted) and downstreamXRayEnabled: true. In the true case I can see a top-level traced: true property on the subsegments in the raw trace data, but I can't observce a difference in the trace as displayed in the console either for a request to an instrumented service or a non instrumented external service (a subsegment that winds up with inferred: true).

It's also not clear what the segment documents documentation means by:

[...] X-Ray considers the trace to be broken [...]

Answer 1 · 2022-03-28T22:04:23.000Z

Hi @jmm,

Can you please clarify the difference between that documentation and the behavior here?

To be honest, I am not sure why we move the traced field up to the top-level subsegment in this case. However you are correct that there's no practical behavior change for this flag. I believe it was originally added to perform some backend de-duplication around having both a client-side subsegment and server-side segment, but we ended just always inferring a segment for a client-side interaction instead. I will check to see if the flag is still used for deduplication.

[...] X-Ray considers the trace to be broken [...]

I believe this is just an optimization - it means the trace is expecting another update from the downstream traced service and should be easily ready for an update. In practice a trace is just a collection of segments, it can always have new segments added, so again I don't think X-Ray actually behaves any differently in this case.

Answer 2 · 2022-03-29T13:54:44.000Z

Hi @willarmiros, thanks for your reply.

I believe it was originally added to perform some backend de-duplication around having both a client-side subsegment and server-side segment, but we ended just always inferring a segment for a client-side interaction instead.

So the X-Ray concepts :: Subsegments documentation says:

Subsegments represent your application's view of a downstream call as a client. If the downstream service is also instrumented, the segment that it sends replaces the inferred segment generated from the upstream client's subsegment. The node on the service graph always uses information from the service's segment, if it's available, while the edge between the two nodes uses the upstream service's subsegment.

I'm not sure if that's what you mean by de-duplication, but this indicates that there will always be a client-side subsegment, and there will always be a corresponding top-level segment -- that will either be inferred, or provided by another instrumented service. (And there's no indication that this behavior is related to the traced property.)

In my test, regardless of the value of downstreamXRayEnabled and presence or absence of traced: true as a top-level property on the subsegment, I get subsegments for calls to external and instrumented services, and I get an inferred top-level segment for a call to a non-instrumented external service, and a non-inferred top-level segment for a call to an instrumented service.

So traced: true as either a top-level property on subsegments or a property of http.request objects isn't necessary for me to get a non-inferred segment for a call to an instrumented service.

However, the AWS X-Ray segment documents documentation suggests that while the tracing is in progress there may be some difference in behavior between the time the subsegment for the call to the instrumented service is created and the time that service uploads its segment. It's unclear what that difference would be though. Again, I don't know what "broken" means here:

traced – (subsegments only) boolean indicating that the downstream call is to another traced service. If this field is set to true, X-Ray considers the trace to be broken until the downstream service uploads a segment with a parent_id that matches the id of the subsegment that contains this block.

It's also unclear what would happen if http.request.traced: true were set on a subsegment for a call to a non-instrumented service where there's never going to be a segment uploaded and you need to wind up with an inferred segment. If that property would cause a problem for requests to non-instrumented services, it doesn't seem like an option that's set at the HTTP module level and applies to all requests would make sense.

So if the X-Ray documentation regarding the traced property is correct, the SDK doesn't seem to be setting it in the correct place.

And if http.request is the correct place to set it, and it were being set there based on downstreamXRayEnabled, it makes me wonder if it would cause a problem with requests (subsegments) to non-instrumented services that need to result in an inferred segment.

I believe this is just an optimization - it means the trace is expecting another update from the downstream traced service and should be easily ready for an update. In practice a trace is just a collection of segments, it can always have new segments added, so again I don't think X-Ray actually behaves any differently in this case.

There must be some point to setting it, right?