prometheus/client_golang

Is Counter.Add() thread-safe?

notrobpike opened this issue · 5 comments

https://github.com/prometheus/client_golang/blob/main/prometheus/counter.go#L137

Technically speaking there is no data race here. But if there are many threads trying to update the same counter, I can imagine that this never makes progress. Is it worth it to add a mechanism similar to Solaris' adaptive lock here? (Start from spinlock and after a time sleep on a lock?)

The impl here could be to loop some number of times, say 256, and then start sleeping a random amount of time between each loop. It's simple. The downside is that if the threads are stacking up then new threads -- that haven't reached the 256 limit yet -- will get serviced first and long-waiting threads get starved out. So to prevent that we add an atomic which is the number of waiters (anyone over 256 that is in the delayed loop) and if that is non-zero on entry then new threads also queue up right away. There's still some random factor and they aren't serviced in strict order but there's more fairness.

Same goes for Gauge.Add().

I haven't experienced any problem myself. I was just looking into the thread safety as I decided I want to call Add() concurrently in my program. So I mean if no one has ever complained about this it might not be worth any effort.

Hey there, @notrobpike! I'm here to help you out with any issues or questions you have. I can assist you with bug fixes, problem-solving, and even becoming a contributor. Let me know how I can assist you today!

The Counter.Add() method in the Prometheus Go client library is designed to be thread-safe. It uses atomic operations to ensure concurrency safety. Specifically, it employs atomic.AddUint64 for integer values and a combination of atomic.LoadUint64 and atomic.CompareAndSwapUint64 for floating-point values [1].

Regarding your concern about potential starvation when multiple threads update the same counter, the current implementation does not include an adaptive locking mechanism. While no issues have been reported, your suggestion to introduce a mechanism like Solaris' adaptive lock could enhance fairness among threads. This would involve transitioning from a spinlock to a sleep state after a certain number of attempts, potentially using an atomic counter to manage waiters. However, if no problems have been experienced, it might not be necessary to implement such changes.

To continue talking to Dosu, mention @dosu.

Those methods are indeed thread-safe. But, you (and our GenAI dosubot friend) are right that you could have contention on extreme amount of writers.

However, this only happens for adding floats, which happens rarely (usually ppl add integers or use Inc), plus you would really need thousands of go-routines or so to trigger this problem, unrealistic for Go applications.

You can watch an excellent talks explaining the details from @beorn7

Thanks for offering you help, but I would wait for any user actually hitting this problem in the real application, before committing to more complexity 🤗

What do you think?

If there are provable better ways of handling thread contention here, I would like to hear (with experimental evidence).

My conclusion many years ago was that we needed access to concepts like "core-local variables" (for lack of a better term, I'm sure there is one used by the actual experts). My vague understanding of the Go runtime is that it uses those concepts under the hood (for sync.Map etc.) but it doesn't expose those features in a way that it would be useable by Go code. I can imagine that this happens in the future, though, and then we could optimize the code for hardware with a lot of cores.

@bwplotka I'm surprised you'd need thousands of goroutines but I'll take your word for it. Agreed this problem should be incredibly rare in practice and not worth the complexity. I only disagree with dosubot's characterization as a starvation issue.

@beorn7 caveat: I'm speaking as very much a non-expert.

I don't believe sync.Map uses anything exotic as you suggest -- it just expects concurrent threads to write disjoint keys and therefore not contend for the same mutex. At a quick glance, it does its magic by using atomic CAS operations just like the prom client does here. In this way threads writing to disjoint keys always succeed the first time on the CAS for their kv data. However, the docs state that if keys overlap by writing threads, than better performance is to be had using more conventional mutex/rwmutex. This tells me that when there is contention it's better to stack up waiters on a mutex than to let multiple threads run all-out competing on CAS.

Whereas here, if there are multiple writing threads, that's exactly counter (get it? counter? heh) to the sync.Map use case recommendation.

It should be easy to generate a benchmark for the prom client, or refer to existing benchmarks for sync.Map.

I do believe the common (by far) use case would be a single or limited number of threads updating the same counter and therefore the current implementation using atomics is spot on.

Actually this problem was a little bit reproduced by Cockroach DB folks and improved in this PR #1661 for context. Perhaps they might have an opinion on further improvement in this place (: