apache/accumulo

Concurrent Tablet refresh and unload hangs VolumeIT

Closed this issue · 1 comments

When I run VolumeIT at cb25f26, the testNonConfiguredVolumes test times out (it's the 2nd test that executes for me). I found the following in the TabletServer log:

2024-04-19T17:10:11,920 57 [tablet.location] DEBUG: Loading !0;~< on ip-1.2.3.4:9997
2024-04-19T17:10:11,985 88 [tablet.location] DEBUG: Loaded !0;~< on ip-1.2.3.4:9997
2024-04-19T17:10:12,325 70 [tablet.walogs] TRACE: !0;~< has unflushed data in wals: [46d8704c-cd67-4750-b04e-9586d85a9aa4] 
2024-04-19T17:10:12,529 169 [tserver.UnloadTabletHandler] INFO : Tablet unload for extent !0;~< requested.
2024-04-19T17:10:12,584 169 [tablet.Tablet] DEBUG: Tablet !0;~< was refreshed because MINC_COMPLETION. Files removed: [] Files added: [F00000e3.rf]
2024-04-19T17:10:12,584 169 [tablet.files] DEBUG: Flushed !0;~< created {"path":"file:/home/dlmari2/workspace/accumulo/test/target/mini-tests/org.apache.accumulo.test.VolumeIT_testNonConfiguredVolumes/volumes/v2/tables/!0/table_info/F00000e3.rf","startRow":"","endRow":""} from [memory]
2024-04-19T17:10:12,585 169 [tablet.Tablet] DEBUG: Tablet !0;~< had no dir, creating file:/home/dlmari2/workspace/accumulo/test/target/mini-tests/org.apache.accumulo.test.VolumeIT_testNonConfiguredVolumes/volumes/v3/tables/!0/table_info
2024-04-19T17:10:12,613 169 [tablet.walogs] TRACE: !0;~< has unflushed data in wals: [] 
2024-04-19T17:10:12,625 169 [tablet.Tablet] DEBUG: Unable to refresh tablet !0;~< for MINC_COMPLETION because the tablet is closed
2024-04-19T17:10:12,625 169 [tablet.Tablet] ERROR: Failed to free tablet memory on !0;~<
java.lang.IllegalStateException: null
        at org.apache.accumulo.tserver.tablet.TabletMemory.finalizeMinC(TabletMemory.java:116) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:424) [classes/:?]
        at org.apache.accumulo.tserver.tablet.MinorCompactionTask.run(MinorCompactionTask.java:114) [classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.completeClose(Tablet.java:921) [classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.close(Tablet.java:781) [classes/:?]
        at org.apache.accumulo.tserver.UnloadTabletHandler.run(UnloadTabletHandler.java:89) [classes/:?]
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) [classes/:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) [classes/:?]
        at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
2024-04-19T17:10:12,630 169 [tablet.MinorCompactionTask] ERROR: Unknown error during minor compaction for extent: !0;~<
java.lang.RuntimeException: Exception occurred during minor compaction on !0;~<
        at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:420) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.MinorCompactionTask.run(MinorCompactionTask.java:114) [classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.completeClose(Tablet.java:921) [classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.close(Tablet.java:781) [classes/:?]
        at org.apache.accumulo.tserver.UnloadTabletHandler.run(UnloadTabletHandler.java:89) [classes/:?]
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) [classes/:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) [classes/:?]
        at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.IllegalStateException: Failed to refresh !0;~<
        at com.google.common.base.Preconditions.checkState(Preconditions.java:601) ~[guava-33.0.0-jre.jar:?]
        at org.apache.accumulo.tserver.tablet.Tablet.refreshMetadata(Tablet.java:1704) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.bringMinorCompactionOnline(Tablet.java:1556) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:407) ~[classes/:?]
        ... 9 more
2024-04-19T17:10:12,630 169 [tserver.UnloadTabletHandler] ERROR: Failed to close tablet !0;~<... Aborting migration
java.lang.RuntimeException: Exception occurred during minor compaction on !0;~<
        at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:420) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.MinorCompactionTask.run(MinorCompactionTask.java:114) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.completeClose(Tablet.java:921) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.close(Tablet.java:781) ~[classes/:?]
        at org.apache.accumulo.tserver.UnloadTabletHandler.run(UnloadTabletHandler.java:89) [classes/:?]
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) [classes/:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) [classes/:?]
        at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.lang.IllegalStateException: Failed to refresh !0;~<
        at com.google.common.base.Preconditions.checkState(Preconditions.java:601) ~[guava-33.0.0-jre.jar:?]
        at org.apache.accumulo.tserver.tablet.Tablet.refreshMetadata(Tablet.java:1704) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.bringMinorCompactionOnline(Tablet.java:1556) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.minorCompact(Tablet.java:407) ~[classes/:?]
        ... 9 more
2024-04-19T17:10:12,630 169 [tserver.UnloadTabletHandler] INFO : Tablet unload for extent !0;~< requested.
2024-04-19T17:10:14,301 58 [tablet.Tablet] DEBUG: Unable to refresh tablet !0;~< for REFRESH_RPC because the tablet is closed
2024-04-19T17:10:14,302 58 [tserver.TabletClientHandler] DEBUG: Unable to refresh tablet : !0;~<
2024-04-19T17:10:14,404 83 [tablet.Tablet] DEBUG: Unable to refresh tablet !0;~< for REFRESH_RPC because the tablet is closed
2024-04-19T17:10:14,405 83 [tserver.TabletClientHandler] DEBUG: Unable to refresh tablet : !0;~<
2024-04-19T17:10:14,663 81 [tablet.Tablet] DEBUG: Unable to refresh tablet !0;~< for REFRESH_RPC because the tablet is closed
2024-04-19T17:10:14,663 81 [tserver.TabletClientHandler] DEBUG: Unable to refresh tablet : !0;~<
2024-04-19T17:10:15,001 67 [tablet.Tablet] DEBUG: Unable to refresh tablet !0;~< for REFRESH_RPC because the tablet is closed
2024-04-19T17:10:15,001 67 [tserver.TabletClientHandler] DEBUG: Unable to refresh tablet : !0;~<
...
(repeats forever)

@dlmarion I was not able to reproduce this issue, but based on the logs you posted I tracked it down and I think #4483 will fix the bug.