Using Fibers causes epic crash
withinboredom opened this issue · 13 comments
Minimal code to reproduce:
<?php
do {
$running = false;
//$running = frankenphp_handle_request(function (): void {
$fiber = new Fiber(function() {
echo "Starting Fiber\n";
});
$fiber->start();
//});
} while ($running);
With some slight modifications, it can also be reproduced in worker mode.
@dunglas the following Docker file (props @cdaguerre in #374) appears to "fix" fibers. At least for this reproducer with manual testing. It needs more testing:
FROM dunglas/frankenphp:latest-builder-php8.3-alpine AS builder
COPY --from=caddy:builder-alpine /usr/bin/xcaddy /usr/bin/xcaddy
ENV CGO_ENABLED=1 XCADDY_SETCAP=1 CGO_CXXFLAGS=-fPIE CGO_CFLAGS=-fPIE CGO_LDFLAGS=-pie XCADDY_GO_BUILD_FLAGS='-buildmode=pie -ldflags="-w -s" -trimpath'
RUN xcaddy build \
--output /usr/local/bin/frankenphp \
--with github.com/dunglas/frankenphp=./ \
--with github.com/dunglas/frankenphp/caddy=./caddy/ \
--with github.com/dunglas/mercure/caddy \
--with github.com/dunglas/vulcain/caddy
🥳 🤞 🤞 still testing...
Great news! Don't hesitate to open a PR with this changes, so we can see if this fix the issue for all architectures.
I'll do some proper testing by Monday (by updating the fiber branch), but I haven't seen a crash yet via manual testing.
@withinboredom I had issues with fibers so I could also test this on my Cloud Run service but not really sure where can I get docker image to use with this fix.
It doesn't fix it, per se, more-or-less just reduces the probability of a crash.
Edit to add: the best way to prevent a crash is to just not output anything at all inside a fiber.
I've just encountered this issue and using the workaround from @withinboredom did resolve the exception. In this project the culprit seem to be the monolog logger as that is the only place fibers are being used.
I started working on a cgo library several weeks ago to allow output from c to go without calling go. It's still a wip: https://github.com/withinboredom/cgoc
There's a segfault once the number of concurrent requests gets high (due to usage of some C synchronization primitives from go), and a memory leak, but the it's pretty fast by itself (~8gbs on my machine).
I hope to have it working sometime in the next few months as a potential solution.
@withinboredom IMHO the best option would be to fix the issue directly in Go!
@dunglas I highly doubt it will ever be fixable, for very valid reasons. The reason it is failing boils down to the following:
- C creates a new thread
- C calls go_handle_request (ncgo = 1)
- Go calls frankenphp_execute_script (reenter C)
- PHP creates a fiber
- C calls Go (
go_ub_write
for example) (ncgo = 2) - crash as designed
According to the CL (https://go-review.googlesource.com/c/go/+/530480) this means changing the stack for an ncgo > 1
will never be possible -- for very valid safety reasons. This was a huge part of my approach in taking over Go threads (ncgo <= 1
always).
If we can fix the ncgo issue, then we are free to muck around with the stack as much as we want.
One way to fix it might be to have go_handle_request
return a pointer that we can continue with (making ncgo = 0), then continuing in C to frankenphp_execute_script
, so if a fiber is created, and we call things like go_ub_write
, ncgo == 1
and it will just reset the stack bounds just fine (in theory).
According to golang/go#62130 (comment), this seems fixable directly in Go for our case.
This would work: C changes stack back
I've been tearing apart the Fiber/boost context implementation to see if I can pop the stack back to original and jump to go, then on returning, replace the stack. The only problem with this approach (and fwiw, I do have it mostly working) is that it requires assembly and I am only familiar with x86-64 assembly. We would need to write assembly for every architecture (and there are some big perf hits here).
It turns out the patch to get it working is pretty darn simple.
diff --git a/src/runtime/cgocall.go b/src/runtime/cgocall.go
index 0d3cc40903..609c5dbc52 100644
--- a/src/runtime/cgocall.go
+++ b/src/runtime/cgocall.go
@@ -215,34 +215,6 @@ func cgocall(fn, arg unsafe.Pointer) int32 {
func callbackUpdateSystemStack(mp *m, sp uintptr, signal bool) {
g0 := mp.g0
- inBound := sp > g0.stack.lo && sp <= g0.stack.hi
- if mp.ncgo > 0 && !inBound {
- // ncgo > 0 indicates that this M was in Go further up the stack
- // (it called C and is now receiving a callback).
- //
- // !inBound indicates that we were called with SP outside the
- // expected system stack bounds (C changed the stack out from
- // under us between the cgocall and cgocallback?).
- //
- // It is not safe for the C call to change the stack out from
- // under us, so throw.
-
- // Note that this case isn't possible for signal == true, as
- // that is always passing a new M from needm.
-
- // Stack is bogus, but reset the bounds anyway so we can print.
- hi := g0.stack.hi
- lo := g0.stack.lo
- g0.stack.hi = sp + 1024
- g0.stack.lo = sp - 32*1024
- g0.stackguard0 = g0.stack.lo + stackGuard
- g0.stackguard1 = g0.stackguard0
-
- print("M ", mp.id, " procid ", mp.procid, " runtime: cgocallback with sp=", hex(sp), " out of bounds [", hex(lo), ", ", hex(hi), "]")
- print("\n")
- exit(2)
- }
-
if !mp.isextra {
// We allocated the stack for standard Ms. Don't replace the
// stack bounds with estimated ones when we already initialized
It turns out, because of a few conditions, nothing fancy is required:
- pthread is really nice to give us proper stack bounds from the fiber
- we are just "popping into go" to send some data in a channel and "pop back out"
- we aren't jumping to/from other threads and then calling back into go from a different thread (the stack is coherent)
If we are OK with having a custom version of go for forever ... then this is likely the best solution, but I highly doubt it would be accepted into go. Note that this is probably a very ugly crash if output is sent from a thread from the parallel extension... because (3) will be violated above. This can probably be mitigated by marshaling the output in C, to the "main" thread, if the current thread isn't the "main" thread. This needs some further testing.
Before I go into this further, are we ok with a custom go patch for the foreseeable future @dunglas? I will create a PR to go, arguing for this patch, but I suspect it won't be accepted.
If we are, this is what I propose:
A. testing for (3) above and verify if any further work is required
B. create PR to apply the patch (might be better to just maintain a fork of go?)
C. create a separate PR to apply any fixes/optimizations for (A)