Deadlock in BinaryLogClient?
fivetran-andydreyfuss opened this issue · 3 comments
I think there may be a deadlock in the code when a term is received. Here is my stack trace
Just to keep things simple I'll use:
K=keepAliveThreadExecutorLock
C=connectLock
In the below
- the "update" thread
- holds K
- waits on its executor thread "blc-keepalive-127.0.0.1:55820"
- the "blc-keepalive-127.0.0.1:55820" thread
- waits on C
- the "blc-127.0.0.1:55820" thread holds C and is waiting on K
- holds C
- waits on K
Here's the abridged stack. Am I missing something?
// holds connectLock, waiting on keepAliveThreadExecutorLock held by "update"
"blc-127.0.0.1:55820" Id=355 WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@34c87dc0 owned by "update" Id=101
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@34c87dc0
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at com.github.shyiko.mysql.binlog.BinaryLogClient.isKeepAliveThreadRunning(BinaryLogClient.java:828)
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:571)
at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:859)
at java.lang.Thread.run(Thread.java:748)
Number of locked synchronizers = 1
- java.util.concurrent.locks.ReentrantLock$NonfairSync@42687457
// holds keepAliveThreadExecutorLock waiting on connectLock held by blc-keepalive
"update" Id=101 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@51c55d90
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@51c55d90
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)
// waiting on connectLock held by "blc-127.0.0.1:55820"
"blc-keepalive-127.0.0.1:55820" Id=123 TIMED_WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@42687457 owned by "blc-127.0.0.1:55820" Id=355
at sun.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@42687457
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
at com.github.shyiko.mysql.binlog.BinaryLogClient.tryLockInterruptibly(BinaryLogClient.java:1215)
at com.github.shyiko.mysql.binlog.BinaryLogClient.terminateConnect(BinaryLogClient.java:1209)
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:881)
at com.github.shyiko.mysql.binlog.BinaryLogClient$5.run(BinaryLogClient.java:803)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Number of locked synchronizers = 1
- java.util.concurrent.ThreadPoolExecutor$Worker@6ea3a369
at com.github.shyiko.mysql.binlog.BinaryLogClient.awaitTerminationInterruptibly(BinaryLogClient.java:1200)
at com.github.shyiko.mysql.binlog.BinaryLogClient.terminateKeepAliveThread(BinaryLogClient.java:1189)
at com.github.shyiko.mysql.binlog.BinaryLogClient.disconnect(BinaryLogClient.java:1177)
at com.REDACTED_COMPANY.integrations.mysql.BinlogClientLifecycleListener.onDisconnect(BinlogClientLifecycleListener.java:64)
at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:586)
at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater.connectWithFallback(MysqlIncrementalUpdater.java:200)
at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater.lambda$syncBinlogPage$2(MysqlIncrementalUpdater.java:181)
at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater$$Lambda$395/857253661.call(Unknown Source)
at com.REDACTED_COMPANY.integrations.db.Retry.actRoot(Retry.java:68)
at com.REDACTED_COMPANY.integrations.db.Retry.act(Retry.java:85)
at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater.syncBinlogPage(MysqlIncrementalUpdater.java:160)
at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater.incrementalUpdate(MysqlIncrementalUpdater.java:114)
at com.REDACTED_COMPANY.integrations.db.DbService.incrementalUpdate(DbService.java:156)
at com.REDACTED_COMPANY.integrations.db.DbService.lambda$update$2(DbService.java:86)
at com.REDACTED_COMPANY.integrations.db.DbService$$Lambda$379/1327179836.run(Unknown Source)
at com.REDACTED_COMPANY.integrations.db.DbService.update(DbService.java:104)
at com.REDACTED_COMPANY.core.Service.update(Service.java:113)
at com.REDACTED_COMPANY.donkey.Pipeline.lambda$update$9(Pipeline.java:376)
at com.REDACTED_COMPANY.donkey.Pipeline$$Lambda$340/786970485.run(Unknown Source)
at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
at java.lang.Thread.run(Thread.java:748)
Number of locked synchronizers = 1
- java.util.concurrent.locks.ReentrantLock$NonfairSync@34c87dc0
PS: I was able to repro by setting the keep alive interval to 1 and the heartbeat interval to 10...
Thanks, Andy! Fix included in 0.21.0.
@shyiko Hello - I was wondering if you might be able to give some context on what specifically about the two commits in 0.21.0 was intended to fix the deadlock issue brought up here. I was attempting to add the fix in to the maintained Osheroff fork of this library but the two commits contain a number of unrelated changes too. Any guidance you could provide would be very appreciated!