nokia/danm

danm-cni dies due to no infinity in busybox sleep

Closed this issue · 4 comments

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

bug

feature

What happened:

After running and installing the CNI plugins, the danm-cni entrypoint script executes a sleep infinity. However, alpine's sleep command is from busybox, and it does not have the infinity time, which causes the Pod to die and crash.

What you expected to happen:

The Pod should just sleep eternally.

How to reproduce it:

Run the current danm-cni-plugins container.

Anything else we need to know?:

I fixed the problem locally by adding coreutils to the installed packages on Alpine, which replaces the busybox sleep with a more standard one which has infinity. A better solution would be to make a proper while loop in the entrypoint script.

Environment:

  • DANM version (use danm -version): git rev 707cc96 (current master)
  • Kubernetes version (use kubectl version): v1.18.0
  • DANM configuration (K8s manifests, kubeconfig files, CNI config file): stock from danm-installer
  • OS (e.g. from /etc/os-release): Talos v0.6.0-dev
  • Kernel (e.g. uname -a): 5.5.15-talos
  • Others:

@Ulexus thanks for the issue!
@carstenkoester could you correct this too in your current PR? I think it is better to install the correct package into the container, cause otherwise even with the maximum sleep value busybox can take we will see restarts happening, which might make users wondering if there is an error with the DS

Reading this made me wonder why I didn't run into this issue when testing the CNI daemonset, especially since I clearly remember that sleep infinity limitation from the past.

It seems that this must have changed between the busybox versions used in alpine 3.10 vs 3.11:

$ docker run --rm -it alpine:3.10 apk list busybox
busybox-1.30.1-r3 x86_64 {busybox} (GPL-2.0) [installed]
$ docker run --rm -it alpine:3.10 sleep infinity
sleep: invalid number 'infinity'


$ docker run --rm -it alpine:3.11 apk list busybox
busybox-1.31.1-r9 x86_64 {busybox} (GPL-2.0-only) [installed]
$ docker run --rm -it alpine:3.11 sleep infinity
(... sleeping, beautifully...)

Busybox changelog confirms that a change titled 'sleep: support "inf"' went in between 1.30.1 and 1.31.0 - so that's good, consistent with the observed result :)

So the "quick way out" might be just to prescribe alpine 3.11 -- it might seem that this was built on a machine that had an older version of the image alpine previously pulled? Would that be an acceptable solution?

Ah, yes. I should have thought of that. That's a much simpler solution.

considering we already use "latest" tagged image in our code I'm closing this issue