DataDog/datadog-lambda-js

Inferred spans feature breaks when Kinesis stream is gzip encoded

vvo opened this issue · 6 comments

vvo commented

Expected Behavior

When using the default value of DD_TRACE_MANAGED_SERVICES (true) and your Kinesis stream is encoded in gzip/deflate then a JSON.parse error is thrown.

This is because the stream we're dealing with is encoded in gzip. This is the case when using https://github.com/aws/amazon-kinesis-streams-for-fluent-bit in an infrastructure.

Thankfully this can be disabled with DD_TRACE_MANAGED_SERVICES=false. Update: this parameter does not work to disable the error as seen here: #264 (comment)

Actual Behavior

This should either work or not throw by default.

Steps to Reproduce the Problem

  1. Create a Kinesis stream
  2. Encode records with gzip
  3. Use this extensions

Specifications

  • Datadog Lambda Layer version: 68
  • Node version: 14.17.6

Stacktrace

SyntaxError: Unexpected token � in JSON at position 0
  at JSON.parse (<anonymous>)
  at readTraceFromKinesisEvent (/opt/nodejs/node_modules/datadog-lambda-js/trace/context.js:254:35)
  at readTraceFromEvent (/opt/nodejs/node_modules/datadog-lambda-js/trace/context.js:451:16)
  at extractTraceContext (/opt/nodejs/node_modules/datadog-lambda-js/trace/context.js:55:17)
  at TraceContextService.extractHeadersFromContext (/opt/nodejs/node_modules/datadog-lambda-js/trace/trace-context-service.js:26:67)
  at TraceListener.onStartInvocation (/opt/nodejs/node_modules/datadog-lambda-js/trace/listener.js:95:52)
  at /opt/nodejs/node_modules/datadog-lambda-js/index.js:149:56
  at step (/opt/nodejs/node_modules/datadog-lambda-js/index.js:44:23)
  at Object.next (/opt/nodejs/node_modules/datadog-lambda-js/index.js:25:53)
  at /opt/nodejs/node_modules/datadog-lambda-js/index.js:19:71
vvo commented

cc @astuyve since you introduced the feature, 🙏

@vvo - thanks for the report! This error should be caught and logged, so it shouldn't halt lambda progression. But I will attempt to recreate this behavior in order to support tracing across Kinesis streams when data is gzipped.

Thanks!

vvo commented

@astuyve Just came here to say:

  1. Thank you for the fast reply
  2. No it does not halt lambda progression, but I would love to not have to ignore errors in logs (nor tell people that "this error is fine").
  3. DD_TRACE_MANAGED_SERVICES=false doesn't actually prevent the error. Because no matter its value we always try to decode the Kinesis data (this is more "problematic" since there's no escape hatch now).

FWIW the data can be either gzip or deflate. So zlib.unzip works well (this is what we're doing).

We want our stream consumer to be efficient. So if using datadog-lambda-js (the layer, through the cdk construct) means that we will decode twice records then I am a little worried about the performance impact. base64 decode + unzipping of data can take up to 200ms in our testing (500 records). Ideally, this would be done once.

Last, I couldn't find the actual benefits of inferred spans (I am a Datadog/tracing newbe), is there any documentation about why it's great (and activated by default).

Thanks @vvo!

The benefit is tracing and monitoring managed services - you can read more here https://www.datadoghq.com/blog/monitor-aws-fully-managed-services-datadog-serverless-monitoring/
This includes quantile metrics for latency.

You can disable creating the span with DD_TRACE_MANAGED_SERVICES=false, as per the readme, but it does still attempt to parse context. We can extend that to here as well.

Perhaps we can expose the decoded and unzipped payload alongside the original payload, offering consumers the ability to read the already parsed data? Alternatively you can reconstruct trace context manually

vvo commented

Perhaps we can expose the decoded and unzipped payload alongside the original payload, offering consumers the ability to read the already parsed data?

Yes, that would be ideal indeed.

but it does still attempt to parse context. We can extend that to here as well.

Again yes that would be awesome. And a note in the README maybe while we're not yet decoding zips.

Correct, I amended my earlier comment which I posted too quickly.