shyiko/mysql-binlog-connector-java

Deadlock in BinaryLogClient?

fivetran-andydreyfuss opened this issue · 3 comments

I think there may be a deadlock in the code when a term is received. Here is my stack trace

Just to keep things simple I'll use:
K=keepAliveThreadExecutorLock
C=connectLock

In the below

  • the "update" thread
    • holds K
    • waits on its executor thread "blc-keepalive-127.0.0.1:55820"
  • the "blc-keepalive-127.0.0.1:55820" thread
    • waits on C
  • the "blc-127.0.0.1:55820" thread holds C and is waiting on K
    • holds C
    • waits on K

Here's the abridged stack. Am I missing something?

// holds connectLock, waiting on keepAliveThreadExecutorLock held by "update"
"blc-127.0.0.1:55820" Id=355 WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@34c87dc0 owned by "update" Id=101
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@34c87dc0
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.isKeepAliveThreadRunning(BinaryLogClient.java:828)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:571)
	at com.github.shyiko.mysql.binlog.BinaryLogClient$7.run(BinaryLogClient.java:859)
	at java.lang.Thread.run(Thread.java:748)

	Number of locked synchronizers = 1
	- java.util.concurrent.locks.ReentrantLock$NonfairSync@42687457

// holds keepAliveThreadExecutorLock waiting on connectLock held by blc-keepalive
"update" Id=101 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@51c55d90
	at sun.misc.Unsafe.park(Native Method)
	-  waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@51c55d90
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1465)
	at java.util.concurrent.Executors$DelegatedExecutorService.awaitTermination(Executors.java:675)

            // waiting on connectLock held by "blc-127.0.0.1:55820"
            "blc-keepalive-127.0.0.1:55820" Id=123 TIMED_WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@42687457 owned by "blc-127.0.0.1:55820" Id=355
                at sun.misc.Unsafe.park(Native Method)
                -  waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@42687457
                at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)
                at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)
                at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)
                at com.github.shyiko.mysql.binlog.BinaryLogClient.tryLockInterruptibly(BinaryLogClient.java:1215)
                at com.github.shyiko.mysql.binlog.BinaryLogClient.terminateConnect(BinaryLogClient.java:1209)
                at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:881)
                at com.github.shyiko.mysql.binlog.BinaryLogClient$5.run(BinaryLogClient.java:803)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:748)

                Number of locked synchronizers = 1
                - java.util.concurrent.ThreadPoolExecutor$Worker@6ea3a369

	at com.github.shyiko.mysql.binlog.BinaryLogClient.awaitTerminationInterruptibly(BinaryLogClient.java:1200)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.terminateKeepAliveThread(BinaryLogClient.java:1189)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.disconnect(BinaryLogClient.java:1177)
	at com.REDACTED_COMPANY.integrations.mysql.BinlogClientLifecycleListener.onDisconnect(BinlogClientLifecycleListener.java:64)
	at com.github.shyiko.mysql.binlog.BinaryLogClient.connect(BinaryLogClient.java:586)
	at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater.connectWithFallback(MysqlIncrementalUpdater.java:200)
	at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater.lambda$syncBinlogPage$2(MysqlIncrementalUpdater.java:181)
	at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater$$Lambda$395/857253661.call(Unknown Source)
	at com.REDACTED_COMPANY.integrations.db.Retry.actRoot(Retry.java:68)
	at com.REDACTED_COMPANY.integrations.db.Retry.act(Retry.java:85)
	at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater.syncBinlogPage(MysqlIncrementalUpdater.java:160)
	at com.REDACTED_COMPANY.integrations.mysql.MysqlIncrementalUpdater.incrementalUpdate(MysqlIncrementalUpdater.java:114)
	at com.REDACTED_COMPANY.integrations.db.DbService.incrementalUpdate(DbService.java:156)
	at com.REDACTED_COMPANY.integrations.db.DbService.lambda$update$2(DbService.java:86)
	at com.REDACTED_COMPANY.integrations.db.DbService$$Lambda$379/1327179836.run(Unknown Source)
	at com.REDACTED_COMPANY.integrations.db.DbService.update(DbService.java:104)
	at com.REDACTED_COMPANY.core.Service.update(Service.java:113)
	at com.REDACTED_COMPANY.donkey.Pipeline.lambda$update$9(Pipeline.java:376)
	at com.REDACTED_COMPANY.donkey.Pipeline$$Lambda$340/786970485.run(Unknown Source)
	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1626)
	at java.lang.Thread.run(Thread.java:748)

	Number of locked synchronizers = 1
	- java.util.concurrent.locks.ReentrantLock$NonfairSync@34c87dc0

PS: I was able to repro by setting the keep alive interval to 1 and the heartbeat interval to 10...

Thanks, Andy! Fix included in 0.21.0.

@shyiko Hello - I was wondering if you might be able to give some context on what specifically about the two commits in 0.21.0 was intended to fix the deadlock issue brought up here. I was attempting to add the fix in to the maintained Osheroff fork of this library but the two commits contain a number of unrelated changes too. Any guidance you could provide would be very appreciated!