AutoMQ/automq

[BUG] Netty PoolArena deadlock caused by StackOverflow

Closed this issue · 1 comments


Found one Java-level deadlock:
=============================
"aws-java-sdk-NettyEventLoop-1-0":
  waiting for ownable synchronizer 0x000010000d172a88, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "data-plane-kafka-network-thread-2-ListenerName(PLAINTEXT)-PLAINTEXT-2"

"data-plane-kafka-network-thread-2-ListenerName(PLAINTEXT)-PLAINTEXT-2":
  waiting for ownable synchronizer 0x0000100079b35af0, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "stream-reader-1"

"stream-reader-1":
  waiting for ownable synchronizer 0x000010000d172a88, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "data-plane-kafka-network-thread-2-ListenerName(PLAINTEXT)-PLAINTEXT-2"

Java stack information for the threads listed above:
===================================================
"aws-java-sdk-NettyEventLoop-1-0":
        at jdk.internal.misc.Unsafe.park(java.base@17.0.11/Native Method)
        - parking to wait for  <0x000010000d172a88> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(java.base@17.0.11/LockSupport.java:211)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.11/AbstractQueuedSynchronizer.java:715)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.11/AbstractQueuedSynchronizer.java:938)
        at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@17.0.11/ReentrantLock.java:153)
        at java.util.concurrent.locks.ReentrantLock.lock(java.base@17.0.11/ReentrantLock.java:322)
        at io.netty.buffer.PoolArena.lock(PoolArena.java:760)
        at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:195)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:139)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:129)
        at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395)
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
        at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:140)
        at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:120)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:150)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(java.base@17.0.11/Thread.java:840)
"data-plane-kafka-network-thread-2-ListenerName(PLAINTEXT)-PLAINTEXT-2":
        at jdk.internal.misc.Unsafe.park(java.base@17.0.11/Native Method)
        - parking to wait for  <0x0000100079b35af0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(java.base@17.0.11/LockSupport.java:211)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.11/AbstractQueuedSynchronizer.java:715)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.11/AbstractQueuedSynchronizer.java:938)
        at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@17.0.11/ReentrantLock.java:153)
        at java.util.concurrent.locks.ReentrantLock.lock(java.base@17.0.11/ReentrantLock.java:322)
        at io.netty.buffer.PoolChunk.free(PoolChunk.java:513)
        at io.netty.buffer.PoolChunkList.free(PoolChunkList.java:120)
        at io.netty.buffer.PoolArena.freeChunk(PoolArena.java:273)
        at io.netty.buffer.PoolArena.free(PoolArena.java:246)
        at io.netty.buffer.PooledByteBuf.deallocate(PooledByteBuf.java:175)
        at io.netty.buffer.AbstractReferenceCountedByteBuf.handleRelease(AbstractReferenceCountedByteBuf.java:111)
        at io.netty.buffer.AbstractReferenceCountedByteBuf.release(AbstractReferenceCountedByteBuf.java:101)
        at com.automq.stream.WrappedByteBuf.release(WrappedByteBuf.java:1049)
        at kafka.log.streamaspect.ElasticLogFileRecords$PooledMemoryRecords.release(ElasticLogFileRecords.java:376)
        at org.apache.kafka.common.record.PooledRecords.release(PooledRecords.java:37)
        at org.apache.kafka.common.record.RecordsSend.release(RecordsSend.java:87)
        at org.apache.kafka.common.record.MultiRecordsSend$$Lambda$3286/0x00007f0e5bbe6b38.accept(Unknown Source)
        at java.util.ArrayList.forEach(java.base@17.0.11/ArrayList.java:1511)
        at org.apache.kafka.common.record.MultiRecordsSend.release(MultiRecordsSend.java:150)
        at org.apache.kafka.common.network.NetworkSend.release(NetworkSend.java:55)
        at org.apache.kafka.common.network.KafkaChannel.maybeCompleteSend(KafkaChannel.java:412)
        at org.apache.kafka.common.network.Selector.write(Selector.java:650)
        at org.apache.kafka.common.network.Selector.attemptWrite(Selector.java:642)
        at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:598)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:486)
        at kafka.network.Processor.poll(SocketServer.scala:1114)
        at kafka.network.Processor.run(SocketServer.scala:1006)
        at java.lang.Thread.run(java.base@17.0.11/Thread.java:840)
"stream-reader-1":
        at jdk.internal.misc.Unsafe.park(java.base@17.0.11/Native Method)
        - parking to wait for  <0x000010000d172a88> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(java.base@17.0.11/LockSupport.java:211)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.11/AbstractQueuedSynchronizer.java:715)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.11/AbstractQueuedSynchronizer.java:938)
        at java.util.concurrent.locks.ReentrantLock$Sync.lock(java.base@17.0.11/ReentrantLock.java:153)
        at java.util.concurrent.locks.ReentrantLock.lock(java.base@17.0.11/ReentrantLock.java:322)
        at io.netty.buffer.PoolArena.lock(PoolArena.java:760)
        at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:195)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:139)
        at io.netty.buffer.PoolArena.allocate(PoolArena.java:129)
        at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395)
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
        at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
        at com.automq.stream.s3.ByteBufAlloc.byteBuffer(ByteBufAlloc.java:113)
        at com.automq.stream.s3.StreamRecordBatchCodec.duplicateDecode(StreamRecordBatchCodec.java:59)
        at com.automq.stream.s3.ObjectReader$DataBlockGroup$1.next(ObjectReader.java:457)
        at com.automq.stream.s3.ObjectReader$DataBlockGroup$1.next(ObjectReader.java:439)
        at com.automq.stream.s3.cache.blockcache.DataBlock.getRecords(DataBlock.java:138)
        at com.automq.stream.s3.cache.blockcache.StreamReader.lambda$read0$7(StreamReader.java:186)
        at com.automq.stream.s3.cache.blockcache.StreamReader$$Lambda$3382/0x00007f0e5bc1f980.accept(Unknown Source)
        at java.util.concurrent.CompletableFuture$UniAccept.tryFire(java.base@17.0.11/CompletableFuture.java:718)
        at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.11/CompletableFuture.java:510)
        at java.util.concurrent.CompletableFuture.complete(java.base@17.0.11/CompletableFuture.java:2147)
        at com.automq.stream.s3.cache.blockcache.DataBlockCache$Cache.lambda$getBlock0$1(DataBlockCache.java:175)
        at com.automq.stream.s3.cache.blockcache.DataBlockCache$Cache$$Lambda$3404/0x00007f0e5bc2ce28.accept(Unknown Source)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@17.0.11/CompletableFuture.java:863)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@17.0.11/CompletableFuture.java:841)
        at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.11/CompletableFuture.java:510)
        at java.util.concurrent.CompletableFuture.complete(java.base@17.0.11/CompletableFuture.java:2147)
        at com.automq.stream.s3.cache.blockcache.DataBlock.complete(DataBlock.java:53)
        at com.automq.stream.s3.cache.blockcache.DataBlockCache$Cache.lambda$read$2(DataBlockCache.java:193)
        at com.automq.stream.s3.cache.blockcache.DataBlockCache$Cache$$Lambda$3402/0x00007f0e5bc2c9b8.accept(Unknown Source)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@17.0.11/CompletableFuture.java:863)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@17.0.11/CompletableFuture.java:841)
        at java.util.concurrent.CompletableFuture$Completion.run(java.base@17.0.11/CompletableFuture.java:482)
        at com.automq.stream.utils.threads.EventLoop.run(EventLoop.java:48)

Found 1 deadlock.

The stack said 'stream-reader-1' holds the lock of PoolTrunk, but from the stack of 'stream-reader-1', there is no PoolTrunk.

Possible StackOverflow causing lock not to be released: