dropwizard/metrics

SlidingTimeWindowMovingAverages data race

carterkozak opened this issue · 1 comments

I believe there's a race condition in SlidingTimeWindowMovingAverages where concurrent accessors may retain a stale currentBucketIndex value, and update unexpected buckets.

Callers call tickIfNecessary, then update (or access):

@Override
public void tickIfNecessary() {
final long oldTick = lastTick.get();
final long newTick = clock.getTick();
final long age = newTick - oldTick;
if (age >= TICK_INTERVAL) {
// - the newTick doesn't fall into the same slot as the oldTick anymore
// - newLastTick is the lower border time of the new currentBucketIndex slot
final long newLastTick = newTick - age % TICK_INTERVAL;
if (lastTick.compareAndSet(oldTick, newLastTick)) {
Instant currentInstant = Instant.ofEpochSecond(0L, newLastTick);
currentBucketIndex = normalizeIndex(calculateIndexOfTick(currentInstant));
cleanOldBuckets(currentInstant);
}
}
}

@Override
public void update(long n) {
buckets.get(currentBucketIndex).add(n);
}

In heavily concurrent environments (e.g. webservers), a compare-and-swap failure within tickIfNecessary (or even another thread completing the tick operation first, not necessarily in parallel) will lead to stale currentBucketIndex values depending on how the methods are optimized at runtime.
currentBucketIndex (and likely a couple other related fields) could be updated with the volatile modifier to avoid stale reads, however this comes with a minor performance penalty as the memory must be read each time it's accessed. It's unclear if this is an intentional tradeoff.

This issue is stale because it has been open 180 days with no activity. Remove the "stale" label or comment or this will be closed in 14 days.