elves/elvish

gdb not working inside elvish

tw4452852 opened this issue · 10 comments

What happened, and what did you expect to happen?

# test.c
void main () {}

compile with gcc: gcc test.c.

~/t> gdb a.out
GNU gdb (GDB) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...
(gdb) b main
Breakpoint 1 at 0x112d
(gdb) r
Starting program: /home/tw/t/a.out
During startup program exited with code 41.

As you can see, breakpoint isn't hit.

By comparison, when I use bash, everything works fine:

[tw@nio t]$ export SHELL=/bin/bash
[tw@nio t]$ gdb a.out
GNU gdb (GDB) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...
(gdb) b main
Breakpoint 1 at 0x112d
(gdb) r
Starting program: /home/tw/t/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".

Breakpoint 1, 0x000055555555512d in main ()
(gdb)

BTW, I tried to dig out the reason until I encounter this: https://stackoverflow.com/a/64274727.
Simply put, the rootcause is that the behavior of elvish -c and bash -c is different.

Output of "elvish -version"

0.20.0-dev.0.20240201150239-7e0b6ee8e626

Code of Conduct

xiaq commented

That behavior is news to me. And apparently all the traditional POSIX-ish shells exhibit this behavior 🤯:

~> bash -c "echo $$; bash -c 'echo $$'"
92128
92128
~> zsh -c "echo $$; bash -c 'echo $$'"
92138
92138
~> dash -c "echo $$; bash -c 'echo $$'"
92140
92140
~> ksh -c "echo $$; bash -c 'echo $$'"
92163
92163
~> elvish -c "echo $pid; bash -c 'echo $$'"
92144
92145

(Note how every other shell outputs the same PID twice, except Elvish).

I wonder if this behavior is mandated by POSIX, but regardless, I suppose the more traditional shells can do this because there's no difference in observed behavior whether the last command is fork/exec-ed or simply exec-ed (other than the value of PID).

Elvish can't really do this because if the last command exits with non-zero, Elvish will do something extra - it shows a stacktrace. These two scripts behave differently:

false
exec false

I'd recommend that you either don't configure Elvish as the login shell (make it the default shell of the terminal instead) or use the SHELL override workaround. In the meanwhile I'll document gdb as one of the programs that assume your login shells is POSIX-ish.

Understood. BTW, ending a is missing in the above commit:

env SHELL=/bin/sh gdb $@a

Be careful with double-quoted string interpolation:

bash> bash -c 'echo $$; bash -c "echo $$"; bash -c "echo \$$"; bash -c "echo $$"'
87794
87794
87796
87794
bash> elvish -c "echo $$; bash -c 'echo $$'"; echo $$
87778
87778
87778

And...

elvish> bash -c "echo $$; bash -c 'echo $$'"
87764
87765
elvish> elvish -c "echo $$; bash -c 'echo $$'"
Multiple parse errors:
  should be variable name
    code from -c:1:7-7: echo $$; bash -c 'echo $$'
  should be variable name
    code from -c:1:8-8: echo $$; bash -c 'echo $$'
Exception: elvish exited with 2
  [tty 147]:1:1-38: elvish -c "echo $$; bash -c 'echo $$'"

I don't know why the use of a non-POSIX shell by gdb (whether Xonsh or Elvish) causes problems but it is not because a POSIX subshell reports the same PID as its parent when expanding the $$ variable. Even POSIX shells spawn a new process, with a unique PID, when running its final command. Try this:

bash> echo $$; bash -c 'echo $$; bash -c "echo \$$"'
87778
87949
87950

Also, note that other complex interactive programs like Vim and Emacs work just fine when launched by Elvish. With the caveat that you typically have to configure the editor to use a POSIX compatible shell rather than whatever non-POSIX compatible shell that the SHELL env var refers to.

Whenever I see a problem like this the first question I ask is what are the arguments passed to the program launched by a program like gdb. It seems likely that those arguments are likely to only be valid for POSIX shells. Which is not surprising. The question is what are those arguments and can a shell like Elvish (or Xonsh) accommodate those expectations.

xiaq commented

Be careful with double-quoted string interpolation:

Yes, that's why I used single quotes in the argument to -c.

I don't know why the use of a non-POSIX shell by gdb (whether Xonsh or Elvish) causes problems but it is not because a POSIX subshell reports the same PID as its parent when expanding the $$ variable. Even POSIX shells spawn a new process, with a unique PID, when running its final command. Try this:

bash> echo $$; bash -c 'echo $$; bash -c "echo \$$"'
87778
87949
87950

I don't know about your version of bash or if you have any configuration, but this command outputs two identical numbers in the second and third lines on both my macOS machine and Linux machine.

Yes, that's why I used single quotes in the argument to -c.

It's not clear we are talking about the same thing because it isn't obvious from your examples what the initial context is. In particular, which shell is evaluating the statements you ran. For example, what shell did you run the following command block:

~> bash -c "echo $$; bash -c 'echo $$'"
92128
92128

That only produces the same two lines if the shell running that statement is a POSIX shell like Bash. And that is because the interpolation done by Bash for the $$ token ignores that the second instance is inside a single-quoted string. Again, compare the result of that command block when launched from Elvish and Bash:

elvish> bash -c "echo $$; bash -c 'echo $$'"
91445
91446
bash> bash -c "echo $$; bash -c 'echo $$'"
91447
91447

The difference is due to the fact that Elvish does not replace the $$ token while Bash does replace that token in both places in the double-quoted string.

Try this from a Bash prompt:

bash> echo $$; bash -c "echo $$; bash -c 'echo $$'"
91447
91447
91447
bash> echo $$; bash -c "echo $$; bash -c 'echo \$$'"
91447
91447
91542
bash> echo $$; bash -c "echo \$$; bash -c 'echo \$$'"
91447
91543
91544

Note that the presence of $$ inside a single quoted string inside a double-quoted string does not affect the interpolation of $$ inside a double-quoted string. Both instances are substituted by the POSIX shell since the POSIX rules for variable interpolation do not distinguish "$var" from `"'$var'":

bash-3.2$ echo "$$"
91575
bash-3.2$ echo "'$$'"
'91575'
bash-3.2$ echo $$
91575
xiaq commented

Yes, that's why I used single quotes in the argument to -c.

It's not clear we are talking about the same thing because it isn't obvious from your examples what the initial context is. In particular, which shell is evaluating the statements you ran. For example, what shell did you run the following command block:

Those were all from Elvish.

Again, I think there's something different about your bash - I consistently get the same PIDs twice running bash -c "echo $$; bash -c 'echo $$'" from Elvish on all the machines I have access to.

I hate mysteries when the observed behavior is contrary to my expectations so I did a quick experiment. I have nine systems that I use for development. Two macOS on bare metal (Apple Silicon/arm64 and x86_64), five Linux distros, FreeBSD 13, and Windows (all as VMs hosted by VMware). Ignoring Windows I see the behavior I reported on six of those systems, and two of them exhibit the behavior observed by @xiaq.

The newest Bash version I have available that behaves as I originally reported is Bash 5.0.17.

The oldest version I have available that behaves as @xiaq reported is Bash 5.2.15.

So sometime after the release of Bash 5.0.17 an optimization was introduced that sometimes elides the fork() when spawning the last process. I say sometimes because setting a trap inhibits the behavior:

elvish> bash -c "echo $$; bash -c 'echo $$'"
84171
84171
elvish> bash -c "trap 'echo exiting' EXIT; echo $$; bash -c 'echo $$'"
84172
84173
exiting

There may, of course, be other things that inhibit the fork() when spawning the last process.

So the question why the PID reported by the last process spawned by Bash is sometimes the same and sometimes different is no longer a mystery. Sadly, I don't think this explains why Elvish is incompatible with Gdb. The most likely explanation is that Gdb is spawning Elvish to run a command block that is valid POSIX syntax but invalid Elvish syntax.

P.S., I don't have any Bash customization on any of my systems. No ~/.bashrc, ~/.bash_profile, ~/.profile, etc. Only whatever initialization scripts are provided by the OS.

As a fun aside... If you do an internet search for "bash fork optimization" you'll get results like https://stackoverflow.com/questions/76310409/does-bash-promise-to-optimize-c-into-plain-exec-in-simple-cases. In other words, whether Bash should sometimes elide the final fork() it would otherwise perform is debatable. Note that programs, like Elvish, which use the equivalent of the posix_spawn() API cannot perform that PID "optimization" since the "spawn" API they use does not allow for replacing the current process with the new process.