Fail to kill dropbear and reset network adapter

Question

Fail to kill dropbear and reset network adapter

mwHsVxkkFZSHo2SDqisDPNr7I opened this issue 11 years ago · 6 comments

mwHsVxkkFZSHo2SDqisDPNr7I commented 11 years ago

I have managed to get it working good with hardened gentoo on an ARM device.

However I ran into one strange problem, when dropbear is started the dropbear.pid file is not created. I found when I changed the option to /tmp/blah , the pid file was created correctly. But the kill command still did not appear to end dropbear and destroy my shell.

I ended up modifying the script module-setup.sh adding:

inst $(which killall) /sbin/killall

and then changing the sshd_kill.sh definition to:

!/bin/sh

killall -9 dropbear

Additionally I added the line:

ip addr flush dev eth0

To the sshd_kill.sh script to allow the gentoo network init script to setup the network device without error. I'm not 100% sure if this was necessary anymore after I added the killall.

Answer 1 · 2013-11-14T06:02:05.000Z

Thanks for the report.

That kill command with pidfile doesn't destroy active shell for me either - I think dropbear main pid (that listens on port) doesn't kill forked shell pids on stop, iirc it's actually same for openssh sshd, so that logged-in users won't loose access to host on e.g. bad restart or accidental stop of the main pid.

Don't think that's much of an issue though - you can just logout and be done with it, and blanket killall is kinda bad, as it should kill random pids user might want to have, and not necessarily dropbear sshd's - could be some random script user decided to name "my_dropbear_thing".
Cleaner way might be to pass some dash script path as a shell to dropbear, that will do "echo $$ >/tmp/shell_pid && exec dash".
I'll look into it a bit later, when I'll get to reboot something maybe.

And I definitely don't want to add ip addr flush dev eth0 line for my setup, as that allows whatever network setup scripts to totally fail or not even be enabled (had both cases on a remote system) and connection will still be working perfectly (otherwise you wouldn't have been able to boot).

Even if proper rootfs init will fail to start proper sshd, working ping should at least be an indication that it didn't abort (causing kernel panic) or anything like that.
And whatever may be wrong with the boot, working net allows one to easily run some stray sshd for debugging without going through all the redundant hoops of starting proper networking again.

On top of that, networking is setup in dracut net module, so I think it should only be reset there (probably after all other cleanup hooks are run), not in a random place like this.
Bet gentoo has either flag or free-form shell hooks that should allow you to easily do such flush before setting whatever proper configuration, or dracut network module might have some option (e.g. "rd.net-reset") to cleanup parameters it setup before pivot (and if not, and you have legit use-case for such flush, maybe submit patch there?).

Answer 2 · 2013-11-14T07:52:01.000Z

I agree, the "ip addr flush dev eth0" does not belong in dracut-crypt-sshd , I can probably add it to one of my init scripts or somewhere else in dracut. Mostly I just wanted to see if that was normal behaviour (that networking stayed up after switch_root).

The reason I need to do it, is because the default gentoo init scripts that require networking, try to setup eth0, this was failing because it was already setup which then led to all networking services to fail to start. I was unable to put eth0 down, which may have been because dropbear was still open/using it inside the initramfs. Maybe now that dropbear is terminated properly the networking script will succeeded, I have not checked.

I agree the killall is not a good way to go about it, although it is a very simple workaround in my case. I like your idea about the dash script :-)

Answer 3 · 2013-11-14T09:25:04.000Z

One other obvious deal-breaker for such networking flush I forgot about is nfs-root, iscsi, aoe and all the other "device/fs over net" things which dracut can do - these might be the actual reason dracut doesn't kill network before pivot by default.

I think fix in 3c3b3f4 should be even simplier - just checked if dropbear spawns all its child-pids under same sid (it does not) or double-forks them (does not either) and apparently just killing pids with ppid (parent pid) = main_pid should work.

One quirk is that dracut doesn't seem to include a tool to do that, and grepping "Ppid: $main_pid" over /proc/*/status sounds hacky, so included "pkill" tool from procps package - guess everyone should have it for "ps", and at least one other dracut module also includes pkill.
Also confirmed that it's indeed the case that dropbear doesn't kill its children on exit by itself.

Didn't test it in dracut yet, but change seem fairly minor and safe, but please leave a note if it won't work for some reason (as these "safe" one-liners tend to do).

Answer 4 · 2013-12-11T19:19:12.000Z

It seems to me that the child processes need to be killed before the main process, e.g. the following works here:

pidfile=/tmp/dropbear.pid
[ -e $pidfile ] || exit 0
read PID < $pidfile

# stop listening process
kill -SIGSTOP $PID

# Kill all child processes
pkill -P $PID

# Kill listening process
kill $PID

I use it in a module with the same purpose.

Answer 5 · 2013-12-13T02:12:17.000Z

Ah, true, ppid should probably change to 1 when parent gets killed. Thanks.

Answer 6 · 2013-12-13T02:25:13.000Z

Though you don't seem to have kill -CONT there, and I'm fairly sure stopped pid will never receive that TERM signal, i.e.:

% while :; do sleep 1; done &
[1] 13026
% kill -STOP 13026
[1]  + suspended (signal)  while :; do; sleep 1; done
% kill 13026
% jobs
[1]  + suspended (signal)  while :; do; sleep 1; done
% kill -CONT 13026
[1]  + terminated  while :; do; sleep 1; done
%

Though it's weird when I use %1 instead of explicit pid, zsh seem to terminate the thing properly - checks if job is suspended and sends the CONT implicitly:

write(3, "kill %1\n", 8)                = 8
...
kill(-13140, SIGTERM)                   = 0
kill(-13140, SIGCONT)                   = 0