Investigate ways to terminate external processes cleanly (with and without goon)
alco opened this issue · 17 comments
The current behaviour of stopping a process is not satisfactory no matter how you slice it.
Without goon
Below, we have an error in the stream name which happens in the spawned process that controls the Erlang port.
iex(1)> p = Porcelain.spawn_shell "ping google.com", out: IO.stream(:stdout, :line)
%Porcelain.Process{err: nil,
out: %IO.Stream{device: :stdout, line_or_bytes: :line, raw: false},
pid: #PID<0.74.0>}
iex(2)>
=ERROR REPORT==== 20-Jan-2015::00:14:52 ===
Error in process <0.78.0> with exit value: {badarg,[{io,put_chars,[stdout,unicode,<<112 bytes>>],[]},{'Elixir.Enum','-reduce/3-fun-0-',3,[{file,"lib/enum.ex"},{line,1266}]},{'Elixir.Stream',do_unfold,4,[{file,"lib/stream.ex"},{line,1126}]},{'Elixir.Enum',reduce,3,[{file,"lib/enum.ex"},{line,1265}]},{...
iex(3)> Porcelain.Process.alive? p
true
iex(4)> Porcelain.Process.stop p
# the shell just hangs
# the external process 'ping' remain alive even after terminating the VM
An example of successfully stopping a port:
iex(1)> p = Porcelain.spawn_shell "ping google.com", out: IO.binstream(:stdio, :line)
%Porcelain.Process{err: nil,
out: %IO.Stream{device: :standard_io, line_or_bytes: :line, raw: true},
pid: #PID<0.73.0>}
PING google.com (173.194.113.193): 56 data bytes
64 bytes from 173.194.113.193: icmp_seq=0 ttl=57 time=11.042 ms
...
iex(2)> Porcelain.Process.stop p
true
We don't get any more input, but ping
keeps running in the background.
With goon
iex(1)> p = Porcelain.spawn_shell "ping google.com", out: IO.binstream(:stdio, :line)
%Porcelain.Process{err: nil,
out: %IO.Stream{device: :standard_io, line_or_bytes: :line, raw: true},
pid: #PID<0.74.0>}
PING google.com (173.194.113.194): 56 data bytes
64 bytes from 173.194.113.194: icmp_seq=0 ttl=57 time=8.044 ms
...
iex(2)> Porcelain.Process.stop p
true
iex(3)> panic: write /dev/stdout: broken pipe
goroutine 3 [running]:
runtime.panic(0xa4ba0, 0x2102a5420)
/usr/local/Cellar/go/1.2.2/libexec/src/pkg/runtime/panic.c:266 +0xb6
log.(*Logger).Panicf(0x2102a6190, 0xde260, 0x3, 0x221040fe30, 0x1, ...)
/usr/local/Cellar/go/1.2.2/libexec/src/pkg/log/log.go:200 +0xbd
main.fatal_if(0xc2840, 0x2102bf7e0)
/Users/alco/extra/goworkspace/src/goon/util.go:38 +0x17e
main.outLoop(0x257338, 0x2102860e8, 0x256fe8, 0x210286008, 0x0, ...)
/Users/alco/extra/goworkspace/src/goon/io.go:151 +0x44a
created by main.wrapStdout
/Users/alco/extra/goworkspace/src/goon/io.go:34 +0x16a
goroutine 1 [chan receive]:
main.proto_2_0(0x7fff5fbf0100, 0xe3fc0, 0x3, 0xde7a0, 0x1, ...)
/Users/alco/extra/goworkspace/src/goon/proto_2_0.go:58 +0x3a3
main.main()
/Users/alco/extra/goworkspace/src/goon/main.go:51 +0x3b6
ping
terminates, but goon
panics.
Hey alco, any news on this? For me a stopping a process doesn't even work with goon. If it matters, the process is node.js and it's started with spawn_shell.
Going to look at fixing this in goon
tonight.
Sorry for the wait @manukall. Is this still relevant to you?
i'm not working on that project anymore. thanks for looking into it, though.
I need this for testing https://github.com/hexpm/hex, during testing we need to do API calls to the server https://github.com/hexpm/hex_web. We do this by starting the API server with a port (or porcelain), the problem is if the VM that runs the hex tests stops unexpectedly the hex_web process keeps running.
EDIT: Actually the hex_web child process is always orphaned after the parent VM terminates.
I'm having the same problem as @ericmj. I'm hosting an http server with IIS Express via porcelain and the server does not get terminated when the beam vm shuts down.
I've been trying to write a plug that interacts with the Ember CLI, and have been seeing the same problem. After the Elixir application shuts down the Node.js process keeps running.
Could this be done with a separate OTP application (maybe that runs at a system level and is never killed) that registers external processes and os kills them under some conditions (e.g. when the OTP app that started the process ends)?
Not just node apps. I'm testing this with a small python bottle server with the same issues.
Neither stop
or signal
stops the underlying web server.
OSX, Python 3, using goon.
Would a minimal example be helpful?
I'm also interested in this issue; it's causing some messiness in a Mix task I use to run tests.
Hey folks! Thanks for the feedback. This is definitely an important issue. I'm hoping to have some time to work on this soon.
@peter-fogg Could you provide more details about the problem you're having? What is the result you're getting and how it's different from the expected one?
Thanks everyone for bearing with me!
@alco Sure -- the short version is that I'm using Porcelain to coordinate some external servers during tests. We have our Phoenix server running against a mocked-out backend API server, and once that's running we run some tests against the Phoenix server. The gist of the tests is:
- Start Phoenix and API with
Porcelain.spawn_shell(command, in: :receive, out: {:send, self()}, err: {:send, self()})
- Listen to both processes and wait for them to be ready to accept HTTP requests
- Start test process with
System.cmd
- When test process is finished, shut down both servers with
Porcelain.Process.signal proc, :int
- Exit with status of the test process
This all works, but I get a panic from Goon sometimes, but not all the time. It also seems to occasionally leave one of the server processes running as an orphan, requiring me to kill it manually before I can run the tests again (since it's using a certain port which will be required for the next time test run).
Let me know if you need some more info. Thanks!
I had a similar requirement to stop spawned, interactive Docker containers when the parent Elixir process aborted. I now also have the requirement for arbitrary scripts I execute. Here is a quick and dirty way I met the requirement in Linux:
- Create a named pipe
mkfifo /home/user/parent_signal_1
(can be named anything of meaning but should be unique for each instance of a child process) - Create a bash script to start child command and watch the named pipe for EOF:
#!/bin/bash
# Start passed in command in the background
$2 &
CHILD_PID=$!
# Watch named pipe passed in (This will hang until EOF received from Elixir process)
cat $1
# Send signal of choice to child process
kill -s SIGKILL $CHILD_PID
- In Elixir application, open an erlang port on the named pipe and start child process with bash script via Porcelain. I use
sleep 60
in this example but that could be any script. It's important to useProcess.link
to link to the Porcelain pid so the process which opened the erlang port will send the EOF to kill the child in the event the Porcelain process itself aborts.
_port = "/home/user/parent_signal_1"
|> String.to_charlist
|> :erlang.open_port([:eof])
%Porcelain.Process{pid: pid} =
"/home/user/start.sh /home/user/parent_signal_1 'sleep 60'"
|> Porcelain.spawn_shell(in: :receive, out: {:send, self()}, err: :out, result: :discard)
Process.link(pid)
That's it! The Docker container setup was slightly more involved and used dumb-init
in order to kill PID 1 but the basic concept is the same. I doubt my approach can be codified into the official approach but it has proven useful to me in the interim. Hopefully this provides some use to someone else to avoid zombie processes.
@gridbox If you child process finishes your process will hang. So you're fixing one way but breaking the other way.
Any updates on this issue?
I am also fighting lingering processes, with nothing in the docs to describe how to handle this. I would appreciate any update on this issue also
Is this helpful? https://hexdocs.pm/elixir/Port.html#module-zombie-processes
Hope porcelain can use this wrap in basic driver.
Finally, after tried every way in porcelain/os/port and something else, I gave up.
I wrote a script called kill_goon.sh
to kill all the orphan processes spawned by porcelain, and I will call this script in the end of my task flow:
#!/usr/bin/env sh
goon_pids=($(ps -e | grep goon | grep -v grep | grep -v kill_goon | awk '{print $1}'))
for pid in "${goon_pids[@]}"
do
pgrep -P $pid | xargs kill
done