GoogleCloudPlatform/berglas

Ctrl-C with "berglas exec" results in duplicated SIGINT to child process

steinarvk-oda opened this issue · 7 comments

We use berglas exec to run an interactive child process. When the user hits Ctrl-C, the child process gets two SIGINTs in quick succession.

This duplicated SIGINT interacts badly with our child process (a python REPL), which isn't supposed to shut down on Ctrl-C, but does crash on two in quick succession.

This seems to be happening because berglas "forwards" signals from the parent process to the child process (with the signal.Notify handler in execRun). The two processes are in the same process group, so Ctrl-C actually delivers a SIGINT to each of them. So the child process gets two SIGINTs -- one forwarded from the parent process and one delivered directly to the child process.

What operating system? I remember researching this and finding Mac and Linux differ on behavior...

I don't think we can disown the process, because then if Berglas crashes, you have an orphan.

Linux. I can reproduce both on a local Linux machine and on linux-based docker images running on GCP. For convenience, I made a repro case:

package main

import (
	"fmt"
	"os"
	"os/signal"
)

func main() {
	sigs := make(chan os.Signal, 10)
	signal.Notify(sigs)
	for sig := range sigs {
		fmt.Println(sig)
	}
}

Compile this program and run it as ./repro (don't "go run" it) to see the expected behaviour -- one "interrupt" per Ctrl-C. Running it under Berglas shows the problem: two "interrupt" per Ctrl-C. (killall -9 repro to exit the repro case.)

steinarvk@steinarvk-laptop:/tmp$ ./repro
^Cinterrupt
^Cinterrupt
Killed
steinarvk@steinarvk-laptop:/tmp$ berglas exec -- ./repro
urgent I/O condition
urgent I/O condition
urgent I/O condition
urgent I/O condition
urgent I/O condition
urgent I/O condition
urgent I/O condition
urgent I/O condition
urgent I/O condition
urgent I/O condition
^Cinterrupt
interrupt
^Cinterrupt
interrupt
urgent I/O condition
urgent I/O condition
process exited non-zero: signal: killed

I haven't tested on Mac.

I'm only able to reproduce this with SIGINT. Does that match your experience? USR1 and other signals only fire once with the child.

The latest version (unreleased) catches SIGINT and immediately terminates the child. SIGINT is not forwarded to the child like other signals.

I can repro the duplication behaviour with any signal if I send a signal to the whole process group (kill -SIGUSR1 -$PGID).

The new behaviour does eliminate the issue with Ctrl-C, but it ends up actually being worse for us. Now no matter how the child process behaves it simply dies on Ctrl-C, which isn't the expected behaviour for all programs (e.g. a python REPL).

I can repro the duplication behaviour with any signal if I send a signal to the whole process group (kill -SIGUSR1 -$PGID).

Right, because that's how unix processes work. You shouldn't be signaling the group, you should be signaling berglas which will forward the signal. That's documented in the exec command:

Berglas will remain the parent process, but stdin, stdout, stderr, and any signals are proxied to the child process.

I can update the behavior to proxy SIGINT on the first invocation and then hard-kill on the second. Would that help?

I've just released v0.6.0 which includes the new signal handling behavior.

This issue has been automatically locked since there has not been any
recent activity after it was closed. Please open a new issue for
related bugs.