flant/shell-operator

shell script stdout redirect issue

jianzzha opened this issue · 2 comments

Discussed in #341

Originally posted by jianzzha December 8, 2021
Hi, first of all this is very cool project and I started to love it on my first trial :)

I'm trying to play with it to simulate what a sriov operator would do: write to the /sys file based on the a VF setting described in CR, here a script sample:

delete_vf () {                                                        
   current=$(cat ${pci_dev_dir}/${pci}/sriov_numvfs)                  
   if ((current > 0)); then                                           
       echo "**** echo 0 > ${pci_dev_dir}/${pci}/sriov_numvfs"        
       echo 0 > ${pci_dev_dir}/${pci}/sriov_numvfs                    
   fi                                                                 
} 

and part of the main body looks like:

    numvfs=$(jq -r '.[0].object.spec.numvfs' ${BINDING_CONTEXT_PATH})
    pci=$(jq -r '.[0].object.spec.pci' ${BINDING_CONTEXT_PATH})      
    pci=$(unify_pci_addr $pci)                                       
    echo "########################## numvfs=${numvfs}, pci=${pci}"   
    if [[ ! -e ${pci_dev_dir}/${pci} ]]; then                        
        echo "!!!!!!! not exist: ${pci_dev_dir}/${pci}"              
        exit 0                                                       
    fi                                                               
    watchEvent=$(jq -r '.[0].watchEvent' ${BINDING_CONTEXT_PATH})    
    if [[ "${watchEvent}" == "Deleted" ]]; then                    
        echo "***** delete VFs"                                      
        delete_vf                                                                                      
    fi

Basically when the CR is deleted, the shell script will do: echo 0 > ${pci_dev_dir}/${pci}/sriov_numvfs

It appears the first time when the delete event is received, this always errors out;

{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"########################## numvfs=3, pci=0000:af:00.1","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:55:55Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"***** delete VFs","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:55:55Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"**** echo 0 \u003e /sys/bus/pci/devices/0000:af:00.1/sriov_numvfs","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:55:55Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"/hooks/crd-hook.sh: line 17: echo: write error: Invalid argument","output":"stderr","queue":"main","task":"HookRun","time":"2021-12-08T20:55:57Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"error","msg":"Hook failed. Will retry after delay. Failed count is 1. Error: crd-hook.sh FAILED: exit status 1","queue":"main","task":"HookRun","time":"2021-12-08T20:55:57Z"}

Then the hook gets a retry then succeed.

#################### Event","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"########################## numvfs=3, pci=0000:af:00.1","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"***** delete VFs","output":"stdout","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"Usage: \u0026{Sys:7.386ms User:10.037ms MaxRss:34856}","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}
{"binding":"kubernetes","event":"kubernetes","hook":"crd-hook.sh","level":"info","msg":"Hook executed successfully","queue":"main","task":"HookRun","time":"2021-12-08T20:56:02Z"}

From the log it seems the redirect has having trouble the first time but the action actually happened; so in the hook retry no action is taken because the "current" is already 0.

If I manually try the same echo redirect command inside this pod it works just fine. Any thoughts?

I was able to manually create this issue and it looks like the base image is causing this issue. See the updated note in the discussion thread. As a comparison, if I use centos image and repeat the same test, it didn't have this problem.

'echo -n' is the answer. Credit goes to @diafour