aws/aws-xray-sdk-node

memory Leak Nodejs - Timedout requests

slice-dinesh opened this issue · 9 comments

we are using AWS Xray Nodejs SDK in one of our projects. This API makes call to different servies using Axios. Whenever downstream API calls get timedout we see memory spike. On taking the heap dump we found that the data of the API calls is still in memory as string. How can we fix this issue?

On disabling AWS by commenting lines below, there is no spike.

AWSXRay.captureHTTPsGlobal(require('http'));
AWSXRay.captureHTTPsGlobal(require('https'));

Hi @slice-dinesh
When the downstream call begins, the subsegment created is stored in local thread storage which is cleared once the call is completed. I believe this is the segment leaking in the memory when the call times out.
Can you post a sample code of how you are using the X-Ray SDK to instrument the downstream call and if possible the string that you are seeing in the memory? Feel free to redact any sensitive information from the string.

The code flow is as:
All this happens at the time of server start

// We first initialize the capture globally AWSXRay.captureHTTPsGlobal(require('http')); AWSXRay.captureHTTPsGlobal(require('https'));

// Secondly we mount the middleware and then mount the routes expressApp.use(AWSXRay.express.openSegment('XXXX'));

// Once routes are mounted we mount the middleware expressApp.use(AWSXRay.express.closeSegment());
This is the whole code.

Also, we are using axios to make API call which is getting timedout and the string in memory is the entire data object (body) being sent in the API call.

Seems like an issue with the cls-hooked. There is a map which is not getting cleaned up. @srprash any update on this?

So, as a part of testing I created a small express server with axios and aws xray. The same problem exists there as well. This server was created using express integration guide in AWS docs. The memory dump shows hug map in context.js file of cls-hooked keeping reference to all these data strings (data being sent in API calls).

Hi @slice-dinesh
These details are very helpful. Would you mind pushing the test app that you have to a repo and provide a link here?
We can refer the app for debugging the issue.

Thanks!

Any update @srprash ?

Hi @slice-dinesh,

I took a look here and was having a tough time finding the memory leak. The CLS namespace used by the X-Ray SDK was correctly cleaned up after each request to the server, even when the request timed out. Could you provide more detailed reproduction instructions, e.g. printing out where memory usage is being consumed unbounded?

Also, for new tracing projects you can check out the AWS Distro for OpenTelemetry JavaScript. It has robust community support and first-class support for X-Ray, you can find the getting started docs here: https://aws-otel.github.io/docs/getting-started/js-sdk/trace-manual-instr