igniterealtime/openfire-monitoring-plugin

GetConversationsWriteETATask throws exceptions in a cluster

Closed this issue · 5 comments

While running version 2.2.0 in a cluster (Openfire 4.6.1), I've seen these exceptions. Each was logged on a different cluster node.

2021.01.06 15:21:45 ERROR [hz.openfire.cached.thread-9]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Unexpected exception running CallableTask[org.jivesoftware.openf
java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
        at org.jivesoftware.openfire.archive.cluster.GetConversationsWriteETATask.run(GetConversationsWriteETATask.java:64) ~[monitoring-2.2.0.jar!/:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) [hazelcast-2.5.0.jar!/:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_265]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) [hazelcast-3.12.5.jar!/:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) [hazelcast-3.12.5.jar!/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) [hazelcast-3.12.5.jar!/:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) [hazelcast-3.12.5.jar!/:?]
2021.01.06 15:21:45 ERROR [message-archive-handler-13]: org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory - Failed to execute cluster task
java.util.concurrent.ExecutionException: java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
        at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_265]
        at java.util.concurrent.FutureTask.get(FutureTask.java:192) [?:1.8.0_265]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:272) ~[?:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_265]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_265]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[?:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[?:?]
Caused by: java.lang.ClassCastException: org.jivesoftware.openfire.plugin.MonitoringPlugin cannot be cast to org.jivesoftware.openfire.plugin.MonitoringPlugin
        at org.jivesoftware.openfire.archive.cluster.GetConversationsWriteETATask.run(GetConversationsWriteETATask.java:64) ~[?:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory$CallableTask.call(ClusteredCacheFactory.java:591) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_265]
        at com.hazelcast.executor.impl.DistributedExecutorService$CallableProcessor.run(DistributedExecutorService.java:270) ~[?:?]
        at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:227) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_265]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_265]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_265]
        at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:64) ~[?:?]
        at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:80) ~[?:?]
        at ------ submitted from ------.(Unknown Source) ~[?:?]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolve(InvocationFuture.java:126) ~[?:?]
        at com.hazelcast.spi.impl.operationservice.impl.InvocationFuture.resolveAndThrowIfException(InvocationFuture.java:79) ~[?:?]
        at com.hazelcast.spi.impl.AbstractInvocationFuture.get(AbstractInvocationFuture.java:191) ~[?:?]
        at com.hazelcast.util.executor.DelegatingFuture.get(DelegatingFuture.java:88) ~[?:?]
        at org.jivesoftware.openfire.plugin.util.cache.ClusteredCacheFactory.doSynchronousClusterTask(ClusteredCacheFactory.java:427) ~[?:?]
        at org.jivesoftware.util.cache.CacheFactory.doSynchronousClusterTask(CacheFactory.java:722) ~[xmppserver-4.6.1.jar:4.6.1]
        at org.jivesoftware.openfire.archive.ConversationManager.availabilityETA(ConversationManager.java:1049) ~[?:?]
        at com.reucon.openfire.plugin.archive.xep0313.IQQueryHandler.lambda$handleIQ$2(IQQueryHandler.java:253) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_265]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_265]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_265]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_265]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_265]

This problem affects version 2.2.0 only. It occurs when the Monitoring plugin is reloaded (or replaced). A workaround for the issue is to restart Openfire after the plugin has been reloaded, which will resolve the issue until the plugin gets reloaded again.

The problem seems to be caused by this plugin suppying Runnable instances to an ArchiveManager instance that's provided by Openfire. Even though the Runnable tasks seem to be shut down properly when the plugin gets unloaded, there's something in Openfire that is retaining a reference to them. My suspicion is that this is caused by the threads that were used to execute the tasks (which do not get shut down immediately, given that the ExecutorService that's used has a strategy to cache them for a while).

When the reloaded plugin gets instantiated, it will cause a new PluginClassLoader to be instantiated (as expected). The old instance isn't garbage collected (as something still references classes loaded by it), which opens the door for the ClassCastExceptions to occur.

This issue is related to igniterealtime/openfire-pushnotification-plugin#19, which defines a problem with classes loaded by a PluginClassLoader used in a cache. For the monitoring plugin, we seem to run into a similar issue with classes used for clustered tasks (instead of caches). @GregDThomas suggested to use serialization to get around the caching issues. Would something similar be applicable to tasks?

I believe that we can close this, as the problem (plugin-provided classes being serialized) was tackled in the fixes for #120 / #170.