ClassCastException after reloading on one cluster node
Opened this issue · 3 comments
When running on a cluster, and the plugin gets reloaded on one of the nodes, exceptions like these get (eventually) logged:
2020.09.04 17:52:26 ERROR [message-archive-handler-2]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task
java.util.concurrent.ExecutionException: java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_252]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) [?:1.8.0_252]
at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:272) ~[?:?]
at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[?:?]
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[?:?]
Caused by: java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
at org.jivesoftware.openfire.archive.cluster.GetConversationsWriteETATask.run(GetConversationsWriteETATask.java:52) ~[?:?]
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_252]
at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) ~[?:?]
at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_252]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_252]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]
at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[?:?]
at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[?:?]
at ------ submitted from ------.(Unknown Source) ~[?:?]
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:126) ~[?:?]
at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:79) ~[?:?]
at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:191) ~[?:?]
at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:88) ~[?:?]
at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(ClusteredCacheFactory.java:427) ~[?:?]
at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory.java:716) ~[xmppserver-4.6.0-SNAPSHOT.jar:4.6.0-SNAPSHOT]
at org.jivesoftware.openfire.archive.ConversationManager.availabilityETA(ConversationManager.java:1045) ~[?:?]
at com.reucon.openfire.plugin.archive.xep0313.IQQueryHandler.lambda$handleIQ$2(IQQueryHandler.java:246) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_252]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_252]
One possible explanation is that the plugin does not get fully unloaded, when being stopped/removed/updated. If some kind of resource doesn't get properly released, a reference to old classes might linger, causing these kind of exceptions.
This can be tested for by performing a heap dump of an Openfire instance where the monitoring plugin was loaded, and then unloaded. All references should be gone.
I'm thinking that receiving tasks from other cluster nodes is what's preventing the classloader from being garbage collected. I've written up the details in https://stackoverflow.com/questions/63794387/can-tasks-sent-over-a-hazelcast-cluster-prevent-unloading-of-classes
A workaround for this issue: restart the senior cluster node after deploying a new instance of the monitoring plugin on it.