libopenstorage/stork

cmdexecutor: failed to run multiple commands at the same time

saheienko opened this issue · 0 comments

Is this a BUG REPORT or FEATURE REQUEST?:
BUG REPORT

What happened:
If you run multiple commands on a pod at the same time, the first command will fail with a timeout.

What you expected to happen:
Both commands are succeeded.

How to reproduce it (as minimally and precisely as possible):
Run a pod and a few commands for it:

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: ubuntu
spec:
  containers:
  - name: ubuntu
    image: ubuntu:latest
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 3600; done;" ]
EOF

kubectl create serviceaccount admin
kubectl create clusterrolebinding test-default-admin --clusterrole=cluster-admin --serviceaccount=default:admin

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: cmd1
spec:
  serviceAccountName: admin
  restartPolicy: Never
  containers:
  - args:
    - -container
    - ubuntu
    - -cmd
    - 'sleep 20 && \${WAIT_CMD}'
    - -taskid
    - cmd1
    - -pod
    - default/ubuntu
    command:
    - /cmdexecutor
    image: openstorage/cmdexecutor:669d6b4
    name: cmdexecutor
EOF

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: cmd2
spec:
  serviceAccountName: admin
  restartPolicy: Never
  containers:
  - args:
    - -container
    - ubuntu
    - -cmd
    - 'sleep 20 && \${WAIT_CMD}'
    - -taskid
    - cmd2
    - -pod
    - default/ubuntu
    command:
    - /cmdexecutor
    image: openstorage/cmdexecutor:669d6b4
    name: cmdexecutor
EOF

Wait for some time and check pods:

kubectl get po
NAME     READY   STATUS      RESTARTS   AGE
cmd1     0/1     Error       0          19m
cmd2     0/1     Completed   0          19m
ubuntu   1/1     Running     0          63m
    

kubectl logs cmd1
time="2022-05-05T08:23:01Z" level=info msg="Running pod command executor: 2.6.3-669d6b4"
time="2022-05-05T08:23:01Z" level=info msg="Using timeout: 900 seconds"
time="2022-05-05T08:23:01Z" level=info msg="Checking status on command: sleep 20 && ${WAIT_CMD}"
time="2022-05-05T08:23:01Z" level=info msg="check status on pod: [default] ubuntu with backoff: {2s 1 0.1 450 0s} and status file: /tmp/stork-cmd-done-cmd1"
time="2022-05-05T08:23:01Z" level=info msg="Running command: sleep 20 && /tmp/wait.sh on pod: [default] ubuntu"
time="2022-05-05T08:39:38Z" level=fatal msg="status command: stat /tmp/stork-cmd-done-cmd1 failed to run in pod: [default] ubuntu due to timed out waiting for the condition"
   

kubectl logs cmd2
time="2022-05-05T08:23:03Z" level=info msg="Running pod command executor: 2.6.3-669d6b4"
time="2022-05-05T08:23:03Z" level=info msg="Using timeout: 900 seconds"
time="2022-05-05T08:23:03Z" level=info msg="Checking status on command: sleep 20 && ${WAIT_CMD}"
time="2022-05-05T08:23:03Z" level=info msg="check status on pod: [default] ubuntu with backoff: {2s 1 0.1 450 0s} and status file: /tmp/stork-cmd-done-cmd2"
time="2022-05-05T08:23:03Z" level=info msg="Running command: sleep 20 && /tmp/wait.sh on pod: [default] ubuntu"
time="2022-05-05T08:23:23Z" level=info msg="successfully executed command: sleep 20 && ${WAIT_CMD} on all pods: [default/ubuntu]"

Anything else we need to know?:
As commands are running in the same pod, the wait.sh script is being overwritten by the second one, so the first one can't get the expected result:

kubectl exec ubuntu -- ls /tmp
wait.sh

kubectl exec ubuntu -- cat /tmp/wait.sh
touch /tmp/stork-cmd-done-cmd2 && while [ ! -f /tmp/killme-cmd2 ]; do sleep 2; done

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others: