igniterealtime/openfire-monitoring-plugin

NullPointerException when loading plugin on senior cluster node

Closed this issue · 7 comments

Just after loading a new version of the plugin on the senior cluster node, this got logged:

2020.09.04 12:41:34 ERROR [hz.openfire.cached.thread-1]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Unexpected exception running CallableTask[org.jivesoftware.openfire.archive.cluster.SendConversationEventsTask]
java.lang.NullPointerException: null
        at org.jivesoftware.openfire.archive.cluster.SendConversationEventsTask.run(SendConversationEventsTask.java:63) ~[?:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) [hazelcast-2.5.0.jar!/:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) [hazelcast-3.12.5.jar!/:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) [hazelcast-3.12.5.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) [hazelcast-3.12.5.jar!/:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) [hazelcast-3.12.5.jar!/:?]
2020.09.04 12:41:35 INFO  [pool-29118-thread-1]: com.reucon.openfire.plugin.archive.impl.JdbcPersistenceManager - Running database table migration to split full JID values into bare JID and resource-part components.
2020.09.04 12:41:35 DEBUG [pool-29118-thread-1]: com.reucon.openfire.plugin.archive.impl.JdbcPersistenceManager - Find columns with full JIDs in the fromJID column: these are rows that have NULL in the resource column, and a slash in the JID column.
2020.09.04 12:41:47 WARN  [hz.openfire.SlowOperationDetectorThread]: com.hazelcast.spi.impl.operationexecutor.slowoperationdetector.SlowOperationDetector - [172.31.25.193]:5701 [openfire] [3.12.5] Slow operation detected: com.hazelcast.executor.impl.operations.MemberCallableTaskOperation
Hint: You can enable the logging of stacktraces with the following system property: -Dhazelcast.slow.operation.detector.stacktrace.logging.enabled
2020.09.04 12:41:50 WARN  [hz.openfire.SlowOperationDetectorThread]: com.hazelcast.spi.impl.operationexecutor.slowoperationdetector.SlowOperationDetector - [172.31.25.193]:5701 [openfire] [3.12.5] Slow operation detected: com.hazelcast.executor.impl.operations.MemberCallableTaskOperation (2 invocations)

And a bit later:

2020.09.04 12:45:21 ERROR [hz.openfire.cached.thread-7]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Unexpected exception running CallableTask[org.jivesoftware.openfire.archive.cluster.SendConversationEventsTask]
java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
        at org.jivesoftware.openfire.archive.cluster.SendConversationEventsTask.run(SendConversationEventsTask.java:61) ~[monitoring-2.0.2-SNAPSHOT.jar!/:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) [hazelcast-2.5.0.jar!/:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) [hazelcast-3.12.5.jar!/:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) [hazelcast-3.12.5.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) [hazelcast-3.12.5.jar!/:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) [hazelcast-3.12.5.jar!/:?]

What appears to be happening is that a cluster task is executed while various components are not yet (or no longer) properly configured, either during startup or shutdown

I wonder if this is a side-effect of having #113 run in the background, preventing full initialization of some components.

I often see this (used to?) with clustered plugins. It's generally best to only start clustering operations once clustering has started (!)

A naive plugin will start on "initialize" - which may be before clustering has started. So if clustering starts in the middle of plugin initialization, things start to break.

Although very likely true, that was not the case with this instance: I deployed a plugin in a node that was already running (for a couple of hours, at least).

This gets logged repeatedly on a cluster node where I've unloaded the plugin that's providing the task that's throwing this exception.

Explicitly shutting down tasks before unloading the plugin might help.

2020.09.04 14:21:07 ERROR [hz.openfire.cached.thread-5]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Unexpected exception running CallableTask[org.jivesoftware.openfire.archive.cluster.SendConversationEventsTask]
java.lang.NullPointerException: null
        at org.jivesoftware.openfire.archive.cluster.SendConversationEventsTask.run(SendConversationEventsTask.java:63) ~[?:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) [hazelcast-2.5.0.jar!/:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_252]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) [hazelcast-3.12.5.jar!/:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) [hazelcast-3.12.5.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) [hazelcast-3.12.5.jar!/:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) [hazelcast-3.12.5.jar!/:?]

I'm thinking that the ClassCastException is unrelated to the NullPointerException that's the subject of this issue. I've raised a new issue for the ClassCastException: #115