OpenHFT/Chronicle-Threads

Loop block monitor stops detecting blockages after a while

Closed this issue · 4 comments

Loop block monitor has a field called printBlockTimeNS which (I think) is there to prevent too much logging for a single blockage... it appears to alllow logging at 1.4x the duration of the last log time out to 20x the configured monitor interval.

I assume this is supposed to get reset (so blockages will again be reported at the configured interval) each time an iteration completes, but that doesn't appear to be happening. If you put a log message in resetTimers() it doesn't appear to ever get called.

I believe the issue is the logic for calling net.openhft.chronicle.threads.ThreadHolder#resetTimers is

        if (startedNS == 0 || startedNS == Long.MAX_VALUE) {
            thread.resetTimers();
            return false;
        }

Which, for a MediumEventLoop will only trigger if you happen to call it during the pauser.pause() that occurs at the end of a NOT busy iteration. If the event loop is always busy, it will never evaluate to true.

As part of this fix we should beef up testing for LBM as it's an important piece of infrastructure.

tgd commented

Thanks Nick - I'll start looking at this.