igniterealtime/openfire-monitoring-plugin

ClassCastException after reloading on one cluster node

Opened this issue · 3 comments

When running on a cluster, and the plugin gets reloaded on one of the nodes, exceptions like these get (eventually) logged:

2020.09.04 17:52:26 ERROR [message-archive-handler-2]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task
java.util.concurrent.ExecutionException: java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_252]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192) [?:1.8.0_252]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:272) ~[?:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[?:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[?:?]
Caused by: java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
        at org.jivesoftware.openfire.archive.cluster.GetConversationsWriteETATask.run(GetConversationsWriteETATask.java:52) ~[?:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_252]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) ~[?:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_252]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[?:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[?:?]
        at ------ submitted from ------.(Unknown Source) ~[?:?]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:126) ~[?:?]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:79) ~[?:?]
        at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:191) ~[?:?]
        at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:88) ~[?:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(ClusteredCacheFactory.java:427) ~[?:?]
        at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory.java:716) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
        at org.jivesoftware.openfire.archive.ConversationManager.availabilityETA(ConversationManager.java:1045) ~[?:?]
        at com.reucon.openfire.plugin.archive.xep0313.IQQueryHandler.lambda$handleIQ$2(IQQueryHandler.java:246) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_252]

One possible explanation is that the plugin does not get fully unloaded, when being stopped/removed/updated. If some kind of resource doesn't get properly released, a reference to old classes might linger, causing these kind of exceptions.

This can be tested for by performing a heap dump of an Openfire instance where the monitoring plugin was loaded, and then unloaded. All references should be gone.

I'm thinking that receiving tasks from other cluster nodes is what's preventing the classloader from being garbage collected. I've written up the details in https://stackoverflow.com/questions/63794387/can-tasks-sent-over-a-hazelcast-cluster-prevent-unloading-of-classes

A workaround for this issue: restart the senior cluster node after deploying a new instance of the monitoring plugin on it.