Shall we switch back to fork()/exec() ?
siteshwar opened this issue · 6 comments
Forking a new process with posix_spawn()
can result in a race condition while setting terminal foreground process group. It was discussed earlier in this thread. This is dgk's response from the thread:
ksh93 uses posix_spawn() rather than fork/exec to run a simple command for
performance reasons. However, posix_spawnv() doesn't have a way to
set the terminal group before the exec(). The parent process sets the
terminal group which leads to the race.
There is a simple change you can make to a program to guarentee that
the terminal will get set at the beginning of the program.
You can add two lines. One line is
#include <termios.h>
and for the other line
tcsetpgrp(2,getpgrp());
Alternatively, you can write a program that does these two lines and
then execs the original program.
This problem has been seen by multiple users and reported in Red Hat bugzilla 1295563. Only reliable way to fix this problem is to switch back to fork()/exec(). I propose that we should enable _AST_no_spawnveg
flag by default.
posix_spawn()
is used due to performance reasons. We should analyze how much performance impact we will have if we stop using it. We can enable _AST_no_spawnveg
as an experiment and see if it causes performance degradation or any other regressions.
I reproduced this bug by adding a sleep(1) so ksh loses the race with the child.
diff --git a/src/cmd/ksh93/sh/xec.c b/src/cmd/ksh93/sh/xec.c
index 9d037351..4574ce6e 100644
--- a/src/cmd/ksh93/sh/xec.c
+++ b/src/cmd/ksh93/sh/xec.c
@@ -3358,6 +3358,7 @@ static pid_t sh_ntfork(Shell_t *shp, const Shnode_t *t, char *argv[], int *jobid
if (grp == 1) job.curpgid = spawnpid;
#ifdef SIGTSTP
if (grp > 0 && !(otype & FAMP)) {
+ sleep(1);
while (tcsetpgrp(job.fd, job.curpgid) < 0 && job.curpgid != spawnpid) {
job.curpgid = spawnpid;
}
Then I entered this Python command in an interactive ksh:
$ python3 -c 'import os; print(os.getpgrp(), os.tcgetpgrp(0))'
32695 98273
The bug is that the two numbers are different: ksh lost the race to python, so python did tcgetpgrp() before ksh did tcsetpgrp(). If I enter the same command in some other shell, the two numbers are equal.
Below is the comment I made in issue #34 where posix_spawn
was mentioned. I'll also add that the fish shell has had problems like the one described above from the use of posix_spawn
. And, as with the ksh code, supporting both fork and posix_spawn greatly complicated the fish code. Despite the fact that quick and dirty benchmarking suggested that, for fish at least, posix_spawn provided very little benefit I was unable to convince them to remove it.
BTW, the intended use case for posix_spawn() is spawning "helper processes" from processes with a large virtual address space (e.,g., a server like a SQL RDBMS), not starting new process groups by a CLI shell. Notably it doesn't provide the equivalent of setsid() and tcsetpgrp(). Which makes it a poor choice for shells like ksh. It's really meant to provide a cheap way for potentially large programs to run external commands without incurring the cost of duplicating its virtual memory structures (e.g., the process page tables).
While it is more efficient than fork() when the virtual process size is large that will not normally be an issue for a ksh process. And we still have to support fork() because posix_spawn() isn't available on all systems we want to support or is broken (e.g., on Cygwin). Also, note that on Linux it's actually implemented as a function not a syscall. On Linux we should be using clone(). And on other systems (e.g., BSD) we should probably be using vfork().
See https://bugzilla.redhat.com/show_bug.cgi?id=1295563 for another example of how posix_spawn() causes problems.
There was a long discussion about using posix_spawn() in the Go language: golang/go#5838. It looks like ultimately they chose to use the Linux clone() function.
I propose that we should enable _AST_no_spawnveg flag by default.
I agree. That should be the default build config. Anyone wanting to risk flakey behavior because they think the performance boost is worth it is welcome to enable posix_spawn
support for their build. But it's more important that the shell behave correctly and predictably than that it run a tiny percentage faster. And it should be easy to do some microbenchmarks with and without posix_spawn
to quantify the difference. I'll bet that unless you do something like cache hundreds of megabytes of data in ksh vars the performance will be close enough not to matter.
Please make FORK/EXEC the compilation default. There are enough flakey problems already without having to worry about whether a new process is in the proper process group or whether it is in the foreground process group for TTY purposes. Enough problems are enough, at least for now (I would think).
Also, from a performance perspective, KSH will always be a relatively tiny program (in memory size) compared with most any other real programs running on the system now-a-days. A modern fork of KSH will be one of the cheapest things that will ever happen on a modern UNIX box!
Thank you.