Summary is not threadsafe
Closed this issue · 1 comments
vigoo commented
The ConcurrentSummary#observe
function is not thread-safe:
def observe(value: Double, t: java.time.Instant): Unit = {
if (currentCount.intValue() == maxSize) {
values.removeFirst()
} else {
currentCount.increment()
}
values.add((t, value))
count.increment()
sum.add(value)
()
}
I have seen a crash in production on the removeFirst
line with a summary where it's maxSize
was set to 1
, my assumption is that the currentCount.intValue() == maxSize
check was evaluated on two fibers in parallel and both tried to remove the single item. (Could not reproduce yet)
Another stress test to show the issue:
object Test extends App {
override def run(args: List[String]): URIO[zio.ZEnv, ExitCode] = {
val summary = MetricAspect.observeSummary("test", 10.minutes, 100, 0.0, Chunk.empty)
(for {
_ <- PrometheusClient.snapshot
.flatMap(snapshot => console.putStrLn(snapshot.value))
.repeat(Schedule.fixed(10.seconds))
.forkDaemon
_ <- (ZIO.succeed(0.1) @@ summary).forever.forkDaemon
_ <- (ZIO.succeed(-0.1) @@ summary).forever
} yield ExitCode.success).provideCustomLayer(PrometheusClient.live)
}
}
With this test the currentCount
and the length of values
in ConcurrentSummary
jumps over maxSize
(usually) and grows infinitely.
jdegoes commented
This is definitely not thread-safe. Summaries should be using arrays that never change length.