yaorg/node-measured

Unbounded memory usage

maxcountryman opened this issue ยท 12 comments

I'm noticing issues where Node processes consuming histograms (and surprisingly) meters are growing memory in a linear, unbounded fashion. Removing measured fixes this issue. I'm not very surprised that histograms are memory-hungry given they're backed by a binary heap, I am much more surprised the meters seem to be as well.

Is this something you've encountered before?

@maxcountryman I never observed this when I still used the library. The code was all written to avoid unbound memory growth.

Perhaps this has something to do with the event emitters attached to these objects?

I implemented my own structures and observed no memory issues whatsoever.

I believe the source of these leaks is setInterval and perhaps the usage of unref. It's a fairly catastrophic leak in environments where a lot of data is being thrown at meters (it's taken down my ec2 instances several times now). I'm not really sure what the best "fix" for this is.

This is important enough to put a notice in the README, if it is verified to be true. As an example -we've taken this as a dependency to avoid writing our own data structures and then as a result to be having to write code for handling unbounded memory edge cases, but looks like it also has these. There might be other people with the same motives and it can save them time to know these trade offs.

I use this library in production code for several years with node 0.12, 4 and 6. We constantly monitor our process memory and I can NOT confirm that there is a leak. To anyone reading this thread, please trace the source of your leaks using a profiler and point us in the right direction. Should there be a memory leak, I'd instantly merge a PR with a fix.

IIRC the problem turned out to be the binary heap that underpins much of this library. This structure eventually causes an OOM error, especially when fed a lot of data quickly.

Coda Hale's metrics library is a much better example of how to do similar things without binary heaps.

I personally stopped using this library as soon as I discovered the problem and so I don't have a vested interest in fixing it. So I'll leave this note here and let the authors decide how they'd like to proceed.

Update: the thing is, that if you use meters (which use a timer to periodically aggregate the data), and then you swap them periodically to avoid a meter going out of range - you get dangling timers.

In addition if you use many meters, let's say 100 in an app, then you have 100 timers running - does that affect performance?

eugef commented

Any update on this issue, does anyone run this library with Node 8 or 10?

@eugef i'm in the process of writing some Gatling performance tests to validate or invalidate this.

@jondot, @maxcountryman

Can you help me understand this issue better so I can do some TDD / reproduce the issue and verify that the issue has been resolved.

I believe the source of these leaks is setInterval and perhaps the usage of unref.

Do you have an example of what you where doing, when you saw the leak.

Update: the thing is, that if you use meters (which use a timer to periodically aggregate the data), and then you swap them periodically to avoid a meter going out of range - you get dangling timers.

What do you mean by the above?

I just want to either write a test that fails so I can do TDD, or if that isn't possible just knowing how to reproduce would be nice so I can manually verify a fix.

When I wrote a naive performance test, I saw the memory do a sawtooth pattern indicating that garbage collection was happening and memory was not linearly growing.

Closing this issue as I can't reproduce it, feel free to re-open with steps to reproduce.

I know I already closed this issue. Just incase anyone finds themselves here.

My suspicion based on some of the comments above. Is that users where creating Meters regularly and not managing their lifecycles properly.

When a Meter is created you must call Meter.end() when you are done with it. The end method clears the interval which would allow the garbage collector to fully destroy the object.

When you use the Collection or Self Reporting Metrics Registry, shutdown methods to properly end their and the registered metrics lifecycle are exposed. SelfReportingMetricsRegistry#shutdown, Collection#end