igniterealtime/openfire-monitoring-plugin

Log flooded with invalid sample time for statistic messages

Closed this issue · 1 comments

When running a cluster with two Openfire servers writing to the same database, the error log can be flooded with messages from the openfire-monitoring plugin about invalid timestamps for statistic samples, e.g.

2020.06.05 11:37:19 ERROR [pool-monitoring2]: org.jivesoftware.openfire.reporting.stats.StatsEngine - Error sampling for statistic sessions
2020.06.05 11:37:19 ERROR [pool-monitoring2]: org.jivesoftware.openfire.reporting.stats.StatsEngine - Error sampling for statistic sessions org.jrobin.core.RrdException: Bad sample timestamp 1591357020. Last update time was 1591357020, at least one second step is required at org.jrobin.core.RrdDb.store(RrdDb.java:587) ~[jrobin-1.5.9.jar!/:?]
 at org.jrobin.core.Sample.update(Sample.java:228) ~[jrobin-1.5.9.jar!/:?]
 at org.jivesoftware.openfire.reporting.stats.StatsEngine$SampleTask.run(StatsEngine.java:394) [monitoring-2.0.0.jar!/:?]
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:1.8.0_231]
 at java.util.concurrent.FutureTask.run(Unknown Source) [?:1.8.0_231]
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:1.8.0_231]
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:1.8.0_231]
 at java.lang.Thread.run(Unknown Source) [?:1.8.0_231]

Code in the StatsEngine class (line 363) actually tries to prevent this, but all it does is... add more logging!

                    // We want to double check the last sample time recorded in the db so as to
                    // prevent the log files from being inundated if more than one instance of
                    // Openfire is updating the same database. Also, if there is a task taking a
                    // long time to complete
                    if(newTime <= db.getLastArchiveUpdateTime()) {
                        Log.warn("Sample time of " + newTime +  " for statistic " + key + " is " +
                                "invalid.");
                    }

A control flow statement (probably 'continue') seems to be missing.

This issue will be fixed in the next release of Monitoring (2.0.2 or 2.1.0).