shell script stdout redirect issue
jianzzha opened this issue · 2 comments
Discussed in #341
Originally posted by jianzzha December 8, 2021
Hi, first of all this is very cool project and I started to love it on my first trial :)
I'm trying to play with it to simulate what a sriov operator would do: write to the /sys file based on the a VF setting described in CR, here a script sample:
delete_vf () {
current=$(cat ${pci_dev_dir}/${pci}/sriov_numvfs)
if ((current > 0)); then
echo "**** echo 0 > ${pci_dev_dir}/${pci}/sriov_numvfs"
echo 0 > ${pci_dev_dir}/${pci}/sriov_numvfs
fi
}
and part of the main body looks like:
numvfs=$(jq -r '.[0].object.spec.numvfs' ${BINDING_CONTEXT_PATH})
pci=$(jq -r '.[0].object.spec.pci' ${BINDING_CONTEXT_PATH})
pci=$(unify_pci_addr $pci)
echo "########################## numvfs=${numvfs}, pci=${pci}"
if [[ ! -e ${pci_dev_dir}/${pci} ]]; then
echo "!!!!!!! not exist: ${pci_dev_dir}/${pci}"
exit 0
fi
watchEvent=$(jq -r '.[0].watchEvent' ${BINDING_CONTEXT_PATH})
if [[ "${watchEvent}" == "Deleted" ]]; then
echo "***** delete VFs"
delete_vf
fi
Basically when the CR is deleted, the shell script will do: echo 0 > ${pci_dev_dir}/${pci}/sriov_numvfs
It appears the first time when the delete event is received, this always errors out;
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"########################## numvfs=3, pci=0000:af:00.1","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:55:55Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"***** delete VFs","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:55:55Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"**** echo 0 \u003e /sys/bus/pci/devices/0000:af:00.1/sriov_numvfs","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:55:55Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"/hooks/crd-hook.sh: line 17: echo: write error: Invalid argument","output":"stderr","queue":"main","task":"HookRun","time":"2021-12-08T20:55:57Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"error","msg":"Hook failed. Will retry after delay. Failed count is 1. Error: crd-hook.sh FAILED: exit status 1","queue":"main","task":"HookRun","time":"2021-12-08T20:55:57Z"}
Then the hook gets a retry then succeed.
#################### Event","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"########################## numvfs=3, pci=0000:af:00.1","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"***** delete VFs","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"Usage: \u0026{Sys:7.386ms User:10.037ms MaxRss:34856}","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"Hook executed successfully","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
From the log it seems the redirect has having trouble the first time but the action actually happened; so in the hook retry no action is taken because the "current" is already 0.
If I manually try the same echo redirect command inside this pod it works just fine. Any thoughts?
I was able to manually create this issue and it looks like the base image is causing this issue. See the updated note in the discussion thread. As a comparison, if I use centos image and repeat the same test, it didn't have this problem.