sync: eliminate global Mutex in Pool operations
bcmills opened this issue · 3 comments
In the review for https://golang.org/cl/101715 (“regexp: use sync.Pool to cache regexp.machine objects”), @ianlancetaylor notes:
A similar change was made in https://golang.org/cl/44150043, by [@bradfitz]. It was rolled back in https://golang.org/cl/48190043. [@dvyukov] said: "The remaining concern is: Are there cases where a program creates lots of local short-lived regexps (one-shot), and it's not possible to replace them with global ones? For this case we've introduced a global contended mutex."
I'm guessing that comment refers to allPoolsMu
here:
Line 242 in 2e84dc2
It strikes me as odd that sync.Pool
should have a global Mutex
at all. After all, sync.Pool
explicitly cooperates with the garbage collector (and will likely cooperate even more after #22950), and the garbage collector will trace all of the live sync.Pool
instances in the course of a normal garbage collection.
The allPools
slice appears to exist so that the garbage collector can clear all of the sync.Pool
instances before tracing them. We have to identify Pool
s before tracing them so that we don't over-retain pooled objects, but it seems like we could do that without acquiring global locks beyond the usual GC safepoints.
For example, we could add newly-allocated pools to a per-G list, and move that list to the global list only when the thread running that G acquires the scheduler lock.
If we happen to miss a new Pool
on the current GC cycle (for example, because its goroutine wasn't descheduled until near the end of the cycle), that's ok: we'll just wait until the next GC cycle to clear it.
That could make sync.Pool
substantially more efficient for objects that tend to be long-lived but are sometimes short-lived too (such as compiled regexps).
For example, we could add newly-allocated pools to a per-G list
I tried to solve the problem today.
My guess is that per-G list is a P local store, so there are two ways to implement it.
Scheme 1 is to modify the runtime to support, scheme 2 is to get a function that returns pid from the runtime via linkname, and use this function to create P local storage.
I tried plan 2, if you use map to implement P local storage, you still need a lock to synchronize. You can also try to implement P local storage with slicing.
Finally found that perhaps can use similar to the https://go-review.googlesource.com/c/go/+/552515 API, use two sharded to solve the problem.
Note: If GOMAXPOCS change, sharded is not 100% guaranteed that the same P operates use the same memory, and different Ps operate use different memory. This means that using sharded simply makes the lock no longer global, making the competition for the same lock less intense.
These are my attempts.
I shared it to help solve the issue.
Change https://go.dev/cl/562336 mentions this issue: sync: eliminate global Mutex in (*Pool).pinSlow operations