watchexec/watchexec

Does not detect successful erl command exit

Closed this issue · 16 comments

When running erl -s init stop Watchexec hangs indefinitely:

$ time erl -s init stop
Erlang/OTP 26 [erts-14.2.2] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
________________________________________________________
Executed in    1.13 secs      fish           external
   usr time  121.84 millis   66.00 micros  121.78 millis
   sys time  195.03 millis  850.00 micros  194.19 millis
$ erl -s init stop
Erlang/OTP 26 [erts-14.2.2] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]
$ echo $status
0
$ watchexec -f 'apps/*/src/**.erl' -- erl -s init stop
[Running: erl -s init stop]
^C[Waiting 60s for processes to exit before stopping...]
fish: Job 1, 'watchexec -f 'apps/*/src/**.erl…' terminated by signal SIGKILL (Forced quit)

(The force quit comes from running pkill -9 watchexec in another terminal)

To reproduce, install Erlang using e.g. mise with mise use erlang@26

$ watchexec —version
watchexec 1.25.1 (d3949cc 2024-01-05) +pid1
commit-hash: d3949cc6e9879225ab8191ad558691da97d28a23
commit-date: 2024-01-05
build-date: 2024-01-05
release: 1.25.1
features: default,pid1

macOS Sonoma 14.3.1

Weird! I'll try to reproduce, but can you provide a log?

Ugh, actually this is showing that the command is plain not running at all. Like, you'd expect the Erlang/OTP 26... line to print, and it just doesn't

@passcod Oops, forgot to attach it. Here's a new one: watchexec.2024-02-29T06-59-09Z.log

@passcod The command is actually running. I tried it with extra flags that sent network traffic and that was executed (and received on the other side). It seems just the output and the exit is not working correctly. The erl executable can be a bit weird and non-unixy sometimes... let me know if you need more help testing it.

Yeah, I've also maybe reproduced it (hard to tell if it's the same tbh) with a completely different command that runs but doesn't output anything within watchexec, so there's something very odd going on.

Logs don't show anything out of the ordinary afaict.

I have a reproduction where I send network traffic over a socket, which runs (but hangs) and I tried to reproduce with just writing to a file instead, but that does not run it seems... 🤯

Ah, found a "local" reproduction:

$ watchexec --print-events --verbose -f 'apps/*/src/**.erl' -- erl -noinput -eval '\'file:write_file("TESTFILE", "hello")\'' -s init stop
[EVENT 0] Event
[Running: erl -noinput -eval 'file:write_file("TESTFILE", "hello")' -s init stop]
[Command was successful]
^C[EVENT 0] Event source=Keyboard signal=Interrupt
[Waiting 60s for processes to exit before stopping...]
$

The above example works (and writes to TESTFILE with the contents hello). If you remove the -noinput flag it hangs without writing to the file.

Wow, good job for finding that! Does the network repro hang if -noinput is passed?

No, it works too. Seems that not passing that flag is the core of the issue.

Hmm, also passing --no-process-group fixes it.

lpil commented

Hi folks! Is this expected behaviour? Have been confused by it myself since upgrading. Thank you

@lpil if the workarounds above (e.g. using --no-process-group) work out for you, then it's effectively expected behaviour, though I'm working on a better alternative to ditching process groups entirely. I don't have a lot of time to spend on unpaid oss at the moment so might take a while.

lpil commented

Hi! It does work, thank you. Sorry, I was wondering if it was a bug or the intended behaviour as I don't understand what precisely is happening here.

Thank you for your work and for answering my question, I am super grateful! 💜

2.0.0 has --wrap-process=session, which solves this issue.

lpil commented

Hello! Just checking, does that mean I should always use that flag when watching Erlang programs? Thank you

Yes