latchset/clevis

Not working killing of child process of clevisloop

oldium opened this issue · 0 comments

oldium commented

EDIT: Updated, I initially thought this is about killing of parent process, but it is actually about a child process

I am migrating my script with TPM1.2 support to Clevis and found this strange line of code in Clevis scripts:

ps -l | awk -v pid="$pid" '$4==pid { system("kill " $3) }'

It looks like the intention was to kill the child process of the clevisloop itself. But it uses fragile ps -l, which has different output on normal system and different versions of BusyBox - there is different set of columns in several versions.

I am under Debian 12 (Bookworm) and added some debug output in clevis local-down script and found that it really cannot work like this.

The clevisloop PID was 275:

  1. ps -l output shows the following (see header - different set of columns):
PID   USER     COMMAND
    1 0        {init} /bin/sh /init
  ...
  275 0        {clevis} /bin/bash /scripts/local-top/clevis
  ...
  436 0        sleep 5
  ...
  596 0        {clevis} /bin/sh /scripts/local-bottom/clevis
  ...
  598 0        ps -l

Really, the BusyBox's ps in Debian does not do much:

#> busybox ps --help
BusyBox v1.35.0 (Debian 1:1.35.0-4+b3) multi-call binary.

Usage: ps [-o COL1,COL2=HEADER] [-T]

Show list of processes

        -o COL1,COL2=HEADER     Select columns for display
        -T                      Show threads

So I used more portable ps -o ppid,pid,comm for output (not killing) and found the following:

PPID  PID   COMMAND
    0     1 init
  ...
    1   275 clevis
  ...
  275   436 sleep
  ...
    1   596 clevis
  596   599 ps

It should be possible to find the child process with ps -o ppid,pid.