Figure out why `LockFreePool` appears to cause unintended object retention (~= memory leak)
cowtowncoder opened this issue · 13 comments
Based on report problem like:
it looks as if the new recycler pool default (see #1117) -- Lock-free Pool -- results in (at least in some cases) unbounded increasing memory retention over time. From symptoms it would appear as if actual releases to pool happened more often than succesful reuse from the pool, as otherwise 2 operations should generally balance out over time (resulting in maximum pool size relative to maximum concurrent use).
We need to figure out what could cause such behavior, or if not, what is the root cause of observed problems.
In the meantime we also will need to do #1256 to allow use of 2.17(.1) without this issue.
/cc @mariofusco @pjfanning -- I hope to learn if there is some specific usage scenario and create multi-threaded reproduction. I am suspecting that somehow reuse from the pool is failing, but perhaps not for every call. jackson-benchmarks
, for example, does not exhibit this problem: but it also does not by default use heavy concurrency and tests do not run long enough to necessarily accumulate humongous buffers.
Also: it is interesting that the one concern I did have wrt unbounded pools ended up happening, it seems.
So it feels like my insistence on ideally having maximum size to avoid this problem (... although, doing so might hide an issue to) seems warranted. For whatever that is worth.
HI @cowtowncoder I would like to investigate this memory issue, but at the moment I have no clue on how to reproduce it on my side. I also asked here and I'm waiting for an answer. I'm reviewing its implementation again, and in all honesty it seems quite straightforward, so at the moment I don't know what to do about it. About making the LockFree pool bounded, this is of course feasible, but I'm against for a few reasons:
- In general I don't like blind fixes
- In this specific case making the pool bounded won't help as to investigate the problem, but only to put it under the carpet
- Making that pool bounded won't be free however, both adding some further complexity to it and introducing a performance cost
- We already have a bounded pool implementation, with performance very similar to the LockFree, so if we want a bounded pool as default implementation, let's just use it
Thinking twice, now I see an admittedly weird situation where the LockFree pool can grow indefinitely. In that pool there is an heuristic to limit the number of retries under heavy contention. When this happens it gives up attempting to retrieve a resource from the pool and creates a new one instead. However this heuristic is implemented symmetrically on both acquire and release, so similarly on release it gives up trying to put back a resource into the pool after a few failed attempts. In essence the pool will grow if there is a huge asymmetry in its usage, when the acquire is always under heavy contention but the release isn't. I'm not sure how to deal with this situation to be honest, but still I would like to see it happens into the wild before deciding on any action.
@mariofusco Your thoughts align remarkably closely with my thinking :)
(not that this guarantees we are necessarily right but it is ok sanity check I think)
As to reproduction, what I was thinking was indeed running a multi-threaded scenario with N threads, reading and writing, against modified copy of pool which allows accessing size, printing it periodically and checking if size would be growing.
I guess one unfortunate part is that any "lost" acquire will add to size and although in some use cases the opposite could occur too (release does not happen on some code path, I assume that is possible too), there are probably cases where release works reliably so even small incremental unbalance will lead to increase retention over time.
As to adding bounded size: I 100% agree that we need to understand the problem first and not add something that would likely just hide the problem.
Once we know what is going on we can reconsider it.
(I have some ideas on how it could work but not actual plan; avoiding need to use single synchronized counter)
I am now reading past the issues and probably giving up on contention is not what we want.
Contention is an inherent problem of concurrent primitives like (compareAndSet-like), but allocating while still being able to release it back, can just causes unbounded growth, which eventually is going again to consume the cpu we tried hard to save (spending it in GC marking and eventually moving, given that when grow it doesn't shrink).
Right now the easier choice imo is to use a counter on acquire/release to keep on trying with spinning and Thread::onSpinWait or yield is there are too many attempts, but without giving up.
Anothe solution can be to spread the contention or uses better primitives which "cannot fail" eg getAndSet
But that means redesigning what it does; just serialising while too contended seems the easier choice to me.
I may be wrong, but to me it seems like ConcurrentDequePool
might be the safer but similar implementation to use -- it uses Deque
for storing pooled objects.
And compared to BoundedPool
it has the benefit of lower overhead for cases where pool itself is not used (since there's no allocations with Deque, it being linked list).
@mariofusco @franz1981: I created #1265 which sort of hints at possible problem, but isn't smoking gun necessarily. When running through 3 main implementations, LockFreePool
occasionally gets a REALLY high pooled count -- I have seen 18,000 once (!) -- but seems to stabilize over time. Other implementations remain typically under thread count (going over isn't unexpected since test runs read and write for some loops, so getting to 2x is theoretically possible).
So no, this does not show monotonically increasing behavior, but suggests that spikes are possible.
It is of course possible (likely?) that contention behavior is heavily platform-dependant and maybe running on Eclipse on Mac mini just doesn't exhibit extreme contention.
I'll see if increasing thread count from 100 up (or, going down to, say, 20) can jog other behavior.
But if anyone else has time to run test, and/or read code and suggest changes, that'd be useful.
Yet another report with "memory leak" reproduction: https://github.com/kkolyan/jackson_leak.
It uses ObjectMapper from 1000 threads and during an hour occupy heap of 8G (with initial occupation by test data of 600M), leading to permanent freeze.
@kkolyan Thank you very much for creating this! I hope to try it out locally, and see if the other 2 implementations (and in particular, Deque backed ones) fared better.
EDIT: looks like it's not trivial to change impl to measure other impls due to RecyclerPoolSizeCalculator
(I used 2.18.0-SNAPSHOT where RecyclerPool.pooledCount() was added).
@kkolyan Trying to run the reproduction, but since I am not a gradle user, not sure what invocation is expected. Guessing from build.gradle
tries
./gradlew test
./gradlew application
neither of which works (build succeeds with ./gradlew build
).
It'd be good to add invocation on README.md
of the repo?
Also had to add mavenLocal()
repository to use jackson-core
locally built 2.17.1-SNAPSHOT
(or could have added Sonatype OSS repository that has those built too).
I'm able to run the test on IDE just fine. Not yet seeing much of a trend, assuming I read output correctly.
But one related thing is this: I backported addition of
default int pooledCount() {
return -1;
}
in RecyclerPool
for 2.17 branch, so locally built (or one from Sonatype OSS repo) jackson-core
has that available. This is useful to:
- Remove need for
RecyclerPoolSizeCalculator
sinceRecyclerPool.pooledCount()
can be used - Allow use of all
RecyclerPool
implementations to see that they don't (... or do?) exhibit similar leakage
Ok, I think we have a reasonable idea as to how the retention may happen: due to imbalanced failures between acquiring reusable buffer (fails more often) vs releasing buffer to the pool (fails less often).
Given this, closing the issue; will proceed with #1271.