runtime: eliminate stack rescanning
aclements opened this issue · 45 comments
One of the largest remaining contributors to GC STW time is stack rescanning. I have an approach for eliminating this entirely. This is a tracking bug for implementing this approach.
I will upload a design document and proof soon, and I have a working implementation that I plan to have cleaned up and mailed out in a day or two.
I'm marking this Go 1.9. My current plan is to get the change in for Go 1.8, but have a GODEBUG flag to fall back to the current algorithm for debugging purposes (and in case something goes wrong). Assuming things go smoothly, we'll actually rip out the stack rescanning code when Go 1.9 opens.
Edit: Design doc
Edit: Things to follow up on in Go 1.9+:
- Remove stack rescanning
- Remove (or replace) stack barriers and delete TestStackBarrierProfiling
- Remove
debug.gcrescanstacks
(Go 1.12, 198440c) - Fix early mark termination race (Go 1.12, #26903)
- Remove work draining from mark termination and
work.helperDrainBlock
(Go 1.12, #26903) - Revisit 100us wait in
stopTheWorldWithSema
(should happen as part of non-cooperative preemption) - Revisit making the second shade conditional (and the condition for channel ops)
/cc @RLH
CL https://golang.org/cl/31362 mentions this issue.
CL https://golang.org/cl/31450 mentions this issue.
CL https://golang.org/cl/31451 mentions this issue.
CL https://golang.org/cl/31369 mentions this issue.
CL https://golang.org/cl/31453 mentions this issue.
CL https://golang.org/cl/31452 mentions this issue.
CL https://golang.org/cl/31457 mentions this issue.
CL https://golang.org/cl/31454 mentions this issue.
CL https://golang.org/cl/31367 mentions this issue.
CL https://golang.org/cl/31368 mentions this issue.
CL https://golang.org/cl/31456 mentions this issue.
CL https://golang.org/cl/31455 mentions this issue.
CL https://golang.org/cl/31366 mentions this issue.
CL https://golang.org/cl/31550 mentions this issue.
CL https://golang.org/cl/31572 mentions this issue.
CL https://golang.org/cl/31570 mentions this issue.
CL https://golang.org/cl/31571 mentions this issue.
CL https://golang.org/cl/31655 mentions this issue.
We used the "double barrier" to address stack scanning issues in the IBM J9 implementation of Metronome. It's in section 4.3 of "Design and implementation of a comprehensive real-time java virtual machine" by Auerbach et al, section 4.3. In that case we were incrementalizing over many thread's stacks, as opposed to the individual stack, but it solves the same problem.
It worked well, although the extra barrier overhead was annoying. Let me know if you have any questions.
david (dfb@google.com)
@davidfbacon, thanks for the reference! Indeed, that looks like the same barrier design. It's good to know that it worked well for Metronome.
I'll update the proposal document to add a citation.
@aclements any numbers on the performance impact of the new barrier? Also, are they harder to eliminate at compile time?
@rasky, it's about a 1.7% performance hit on the x/benchmarks garbage benchmark (which, as the name suggests, is designed to hammer the garbage collector). I haven't checked, but I suspect we've gained more than that from other optimizations since Go 1.7.
They are harder to eliminate at compile time. I completely disabled the optimizations that don't carry over directly to the hybrid barrier and binaries got about 1% larger. We could eliminate some of these, but it requires flow analysis and the current insertion code doesn't do any flow analysis. OTOH, the places where we can eliminate write barriers with the current write barrier aren't all that common anyway (how often do you write the address of a global to something?), so we're not losing much.
CL https://golang.org/cl/31764 mentions this issue.
CL https://golang.org/cl/31766 mentions this issue.
CL https://golang.org/cl/31765 mentions this issue.
CL https://golang.org/cl/31763 mentions this issue.
CL https://golang.org/cl/31820 mentions this issue.
CL https://golang.org/cl/31890 mentions this issue.
Thanks for the clue. The design doc has been updated. It's comforting to
know that this path has been traveled and no gotchas materialized.
On Fri, Oct 21, 2016 at 12:23 PM, Austin Clements notifications@github.com
wrote:
@davidfbacon https://github.com/davidfbacon, thanks for the reference!
Indeed, that looks like the same barrier design. It's good to know that it
worked well for Metronome.I'll update the proposal document to add a citation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#17503 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA7Wn1yxx5Gh26g2QaeCeflflTu5vCpdks5q2Ob-gaJpZM4KZ1aw
.
CL https://golang.org/cl/32033 mentions this issue.
Thank you for the design doc. It looks really great!
For channel operations, the
shade(ptr)
is necessary if either the source stack or the destination stack is grey.
I'm wondering whether this is necessary.
- If the source stack is black,
ptr
cannot be an unprotected white pointer, so there is no need to shade it. - If the destination stack is grey,
ptr
will be scanned eventually when the destination stack becoming black (orptr
becomes dead before it).
So maybe we only need to shade(ptr)
when the source stack is grey and the destination stack is black. Of course it is ok to just conservatively shade it in Go 1.8.
CL https://golang.org/cl/32093 mentions this issue.
CL https://golang.org/cl/32095 mentions this issue.
CL https://golang.org/cl/32186 mentions this issue.
The hybrid barrier is now live on master.
Detailed benchmark results are in the commit message for bd640c8. The high level summary is that this reduces worst-case STW time to about 100 µs and typical 95%ile STW time to 50 µs (assuming, of course, that the OS doesn't get in the way and that the system isn't otherwise overloaded). Performance impact is about 1% on average and goes up to about 5% for pointer-intensive workloads. However, we've more than paid for this with other throughput improvements in Go 1.8, so nearly all of the benchmarks still show a performance gain relative to Go 1.7.
@cherrymui, I think you're right that channel operations only need the second shade if the source stack is grey and the destination stack is black. For 1.8 we're always performing the second shade (channel operations or not), which is more conservative for correctness and makes it easier to have the GODEBUG setting to fall back to the current behavior (or a superset thereof). But I'll definitely consider your observation in depth for Go 1.9 when I look at making the second shade conditional.
CL https://golang.org/cl/36621 mentions this issue.
CL https://golang.org/cl/36619 mentions this issue.
CL https://golang.org/cl/36620 mentions this issue.
Austin, looks like this can be closed? I'm trying to close or remilestone all Go1.9Early bugs.
Ping @aclements : Should this issue be kept open?
Change https://golang.org/cl/134318 mentions this issue: runtime: eliminate mark 2 and fix mark termination race
Change https://golang.org/cl/134785 mentions this issue: runtime: eliminate gchelper mechanism
Change https://golang.org/cl/134777 mentions this issue: runtime: remove GODEBUG=gcrescanstacks=1 mode