ClassCastException after reloading on one cluster node

Question

ClassCastException after reloading on one cluster node

Opened this issue 4 years ago · 3 comments

When running on a cluster, and the plugin gets reloaded on one of the nodes, exceptions like these get (eventually) logged:

2020.09.04 17:52:26 ERROR [message-archive-handler-2]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task
java.util.concurrent.ExecutionException: java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_252]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192) [?:1.8.0_252]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:272) ~[?:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[?:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[?:?]
Caused by: java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
        at org.jivesoftware.openfire.archive.cluster.GetConversationsWriteETATask.run(GetConversationsWriteETATask.java:52) ~[?:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_252]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) ~[?:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_252]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_252]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[?:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[?:?]
        at ------ submitted from ------.(Unknown Source) ~[?:?]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:126) ~[?:?]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:79) ~[?:?]
        at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:191) ~[?:?]
        at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:88) ~[?:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(ClusteredCacheFactory.java:427) ~[?:?]
        at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory.java:716) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
        at org.jivesoftware.openfire.archive.ConversationManager.availabilityETA(ConversationManager.java:1045) ~[?:?]
        at com.reucon.openfire.plugin.archive.xep0313.IQQueryHandler.lambda$handleIQ$2(IQQueryHandler.java:246) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_252]

Answer 1 · 2020-09-04T18:01:29.000Z

One possible explanation is that the plugin does not get fully unloaded, when being stopped/removed/updated. If some kind of resource doesn't get properly released, a reference to old classes might linger, causing these kind of exceptions.

This can be tested for by performing a heap dump of an Openfire instance where the monitoring plugin was loaded, and then unloaded. All references should be gone.

Answer 2 · 2020-09-08T13:01:09.000Z

I'm thinking that receiving tasks from other cluster nodes is what's preventing the classloader from being garbage collected. I've written up the details in https://stackoverflow.com/questions/63794387/can-tasks-sent-over-a-hazelcast-cluster-prevent-unloading-of-classes

Answer 3 · 2020-09-08T13:11:03.000Z

A workaround for this issue: restart the senior cluster node after deploying a new instance of the monitoring plugin on it.