EventGroup sometimes times out on start
Closed this issue · 5 comments
EventGroup#start
blocks until the core event loop has started. There have been multiple reports of performStart
timing out and throwing an exception. This is unexpected and hasn't been reproduced reliably.
One client reported setting the timeout to 60s and still seeing the timeout, so probably not a case of the timeout being too short.
Chronicle-Threads/src/main/java/net/openhft/chronicle/threads/EventGroup.java
Lines 394 to 412 in 8d15fc2
This appears to be class-loader related and probably due to a dependency between static initialisers that can arise. The stack-trace of the blocked event loop thread is as follows:
{"time":"20221109-23:26:50.921", "level":"WARN", "log":"Thread dump", "exceptionCause":"Stack trace main/core-event-loop", "threadName":"main/core-event-loop", "threadId":"30", "threadState":"RUNNABLE", "waitCount":"0", "waitTime":"-1", "blockedCount":"0", "blockedTime":"-1", "lockName":"null", "lockO
com.company.ThreadDumpStackTraceWrapper: Stack trace main/core-event-loop
at java.base@11.0.3/java.lang.ClassLoader.defineClass1(Native Method)
at java.base@11.0.3/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
at java.base@11.0.3/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base@11.0.3/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:802)
at java.base@11.0.3/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:700)
at java.base@11.0.3/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:623)
at java.base@11.0.3/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
at java.base@11.0.3/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base@11.0.3/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
at app//com.sun.jna.Native.initIDs(Native Method)
at app//com.sun.jna.Native.<clinit>(Native.java:222)
at app//com.sun.jna.Structure.setAlignType(Structure.java:280)
at app//com.sun.jna.Structure.<init>(Structure.java:197)
at app//com.sun.jna.Structure.<init>(Structure.java:193)
at app//com.sun.jna.Structure.<init>(Structure.java:180)
at app//com.sun.jna.Structure.<init>(Structure.java:172)
at app//net.openhft.affinity.impl.LinuxHelper$utsname.<init>(LinuxHelper.java:183)
at app//net.openhft.affinity.impl.LinuxHelper.<clinit>(LinuxHelper.java:37)
at app//net.openhft.affinity.impl.LinuxJNAAffinity.<clinit>(LinuxJNAAffinity.java:42)
at app//net.openhft.affinity.Affinity.isLinuxJNAAffinityUsable(Affinity.java:115)
at app//net.openhft.affinity.Affinity.<clinit>(Affinity.java:54)
at app//net.openhft.chronicle.core.threads.CleaningThread.run(CleaningThread.java:150)
I've seen this happen again a few times. I think perhaps we should do a JVM-wide thread dump when this happens to help troubleshoot it. It has been caused by a deadlock in static initialisers in the past, which was resolved by eagerly initialising some things.
Released in Chronicle-Threads-2.23.29, BOM-2.23.200
Released in Chronicle-Threads-2.24.15, BOM-2.24.88
Released in Chronicle-Threads-2.25ea1, BOM-2.25ea1