googleapis/cloud-profiler-nodejs

Cloud Profiler has memory leak when used with v12.16.0 or greater and v14.0.0 or greater.

plegner opened this issue ยท 16 comments

Using the Cloud Profiler on AppEngine NodeJS Managed VMs causes a memory leak.

  • Node.js version: 12.x
  • @google-cloud/profiler version: ^4.0.3
  • Environment: Appengine Managed VMs

Here is a memory profile with the cloud profiler enabled:

before-memory

And here is the same server running with the cloud profiler disabled:

Screenshot 2020-10-07 at 09 22 10

The only difference is commenting out these lines:

if (env === 'production') require('@google-cloud/profiler').start()

Thank you for reporting this. Does this happen on one specific service or does it happen on other services as well?

Do you run more than one instance of this service?

I only have a single service where I've used the profiler, but it happens on all (2) instances running that service. Happy to send you the project/service ID.

(I had previously filed case #25366321 in Cloud support, and one of your colleagues suggested removing the profiler.)

I was able to reproduce running on my workstation, with Node.js 12.18.0 on my linux workstation using the AcmeAir application.

Logging process.memoryUsage() shows a memory increase is in RSS, with no corresponding increases in heap total, heap used, external or array buffer categories.

My next steps are to:

  • reproduce using pprof-nodejs and a simpler-to-run application (this would allow us to confirm that the issue is with profile collection; it also generally allows us adjust profile collection settings to make the memory leak happen faster).
  • identify if this is specific to a version of @google-cloud/profiler or more specific to a version of Node.js.
  • identify if this leak is specific to the time or heap profiles.

@plegner -- Do you happen to know the specific version of Node.js being used?

I'm running an application with node 12.0.0 for comparison; so far, it looks like there is a memory leak when I use node v12.18.0, but not when I use node v12.0.0.

There does appear to be a chromium issue for this.
https://bugs.chromium.org/p/v8/issues/detail?id=10883&q=profiler&can=2

I'm using "engines": { "node": "12.x" } in package.json, but I don't know which specific node version AppEngine picks

@nolanmar511 Thank you for the analysis.
Let's recommend not to use 12.18.0 with Profiler on the repo home page and the GCP docs.

I assume this issue still impact 12.19.0.

Let me confirm that this issue does not impact 12.17.0 (the latest pre-12.18 version); then update docs?

I should also see if Node 14 is impacted (my guess is yes).

I haven't been able to create a smaller reproducer yet; so this is still slow to determine.

It looks like node v12.17.0 may also be impacted.

At this point in time, I would recommend that users either use Node 12.15.0 or earlier or do not use the Node.js profiling agent.
Node 12.16.0 on and Node 14.0.0 on have memory leaks when profiling agents are enabled.

Thank you for looking into this. Please add it to known issues on https://cloud.google.com/profiler and this repo home page.

I logged a new bug in V8 as the other one turned out to be unrelated: v8:11051 . The status update is that I think we've found the root cause, working on a fix at the moment which we should be able to backport to Node 12 LTS.

There is a workaround the pprof module could use in bindings/profiler.cc: Use a new CpuProfiler object for each profile rather than keeping it alive when no profiles are running

Awesome!
I'm working on implementing and testing the work-around in pprof-nodejs now.

The fix is now released; I'm validating that it works as expected locally and will update here in a few hours.

I've confirmed that the latest version of @google-cloud/profiler does not have the memory leak.