Deadlock issue when tracking impressions
Closed this issue · 2 comments
We have a high throughput Spring-based web application that frequently encounters a deadlock scenario in the Split SDK, ultimately requiring restarting the web application to resolve.
Specifically, all threads get stuck with the following stack trace:
deployment.redrock.ear//io.split.client.impressions.UniqueKeysTrackerImp.track(UniqueKeysTrackerImp.java:59)
deployment.redrock.ear//io.split.client.impressions.strategy.ProcessImpressionNone.process(ProcessImpressionNone.java:28)
deployment.redrock.ear//io.split.client.impressions.ImpressionsManagerImpl.track(ImpressionsManagerImpl.java:116)
deployment.redrock.ear//io.split.client.SplitClientImpl.recordStats(SplitClientImpl.java:342)
deployment.redrock.ear//io.split.client.SplitClientImpl.getTreatmentWithConfigInternal(SplitClientImpl.java:258)
deployment.redrock.ear//io.split.client.SplitClientImpl.getTreatmentWithConfig(SplitClientImpl.java:99)
We encountered this a few months before and decided to switch to use None
as our impressionsMode
assuming that would solve the issue (we don't care about impressions anyway). However, digging into the code it appears the ProcessImpressionsNone.java still uses the UniqueKeysTracker which is backed by a ConcurrentHashMap
, ultimately deadlocking on the call to put
.
To solve this issue, could we this issue be resolved in one of the two following ways:
- Provide us with a true "no-op" impressions tracker that disables handling impressions entirely (e.g. the
process
method is empty) - Fix the deadlock issue within the UniqueKeysTracker - either by not using a
ConcurrentHashMap
or wrap theput
call on the ConcurrentHashMap with some kind of timeout?
As mentioned, impressions are not incredibly important to us and the application not deadlocking is our biggest goal.
For what its worth, we are using version 4.7.0
of the client and looking at the changes since then I do not see any changes that would lead me to believe this issue has been addressed in the latest version (4.11.0
).
I took a closer look at this and since we are running version 4.7.0
of the client, the place where the threads are stuck:
deployment.redrock.ear//io.split.client.impressions.UniqueKeysTrackerImp.track(UniqueKeysTrackerImp.java:59)
actually aligns with the first line of the UniqueKeysTrackerImp.track method which tells me other threads are stuck trying to get into this method because it is synchronized
.
I'm not sure what could be blocking in this method, but does it really need to be synchronized
?
In the meantime, I've opened this PR to completely disable processing impressions (basically implementing option 1 in my original post). Please review at your earliest convenience.
Hi @ryaneorth ,
Sorry for the inconvenience on experiencing deadlock issues. We have released JAVA 4.11.1
which is a revamp of the None process logic.
I will proceed on closing this issue, feel free to reopen if you are experiencing some issue again.
Thanks,
Matias