nestybox/sysbox-ee

Issues running a Rancher container inside a Sysbox container

sanzenwin opened this issue · 10 comments

docker run --name=test -it nestybox/ubuntu-bionic-systemd-docker:latest
docker exec -it test bash

# Inside the Sysbox container
docker run --privileged rancher/rancher

ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes

@sanzenwin, unfortunately Sysbox doesn't support Rancker/K3s yet, but this is something that we will be adding fairly soon.

Can you please let me know what's the use-case that you have in mind for Sysbox? Maybe we can offer you a workaround.

@rodnymolina, Rancher is easier to deploy than K8s, I trying to test it on single machine. I will focus on your another project :
https://github.com/nestybox/kindbox

Thanks @sanzenwin for reporting the issue.

docker run --privileged rancher/rancher
ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes

It seems the rancher container entrypoint is looking for the presence of /dev/kmsg and it's not finding it:

   4 │ if [ ! -e /run/secrets/kubernetes.io/serviceaccount ] && [ ! -e /dev/kmsg ]; then                                                                                                                                                                                                                                      
   5 │     echo "ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes"                                                                                                                                                                                                                    
   6 │     exit 1                                                                                                                                                                                                                                                                                                             
   7 │ fi  

It's strange because /dev/kmsg is exposed inside the parent Sysbox container:

sysbox-container: /# ls -l /dev/kmsg
crw-rw-rw- 1 nobody nogroup 1, 3 Apr 21 19:25 /dev/kmsg

Thus we would expect that running inside a privileged container inside the sysbox container would also expose that device inside the container:

sysbox-container: /# docker run --privileged ubuntu:18.04 ls -l /dev/kmsg
ls: cannot access '/dev/kmsg': No such file or directory

We need to dig into why that is the case. I suspect the Docker instance running inside the Sysbox container did not like the "nobody:nogroup" on /dev/kmsg and as a result did not pass it into the inner Rancher container.

Fortunately it's easy to work-around this by passing the device into the container explicitly with --device:

sysbox-container: /# docker run --privileged --device /dev/kmsg:/dev/kmsg -it rancher/rancher 

That causes the rancher container to initialize. I am not familiar with Rancher (yet) so I can't tell if it initializes correctly, but it appears it did.

Please give that a try and let us know.

Thanks!

Good idea!

I see k3s control-plane coming up but there are a few errors being dumped by rancher, so i'm not sure how reliable this will be till we fully test it in our setups.

@streamnsight, let us know how it goes with Cesar's workaround.

root@441df534ab82:/var/lib/rancher# k3s kubectl get all --all-namespaces
NAMESPACE       NAME                                    READY   STATUS      RESTARTS   AGE
cattle-system   pod/helm-operation-7trtw                0/2     Completed   0          17m
cattle-system   pod/helm-operation-wlbm2                0/2     Completed   0          16m
fleet-system    pod/fleet-agent-66c54576c6-5gtqh        1/1     Running     0          12m
fleet-system    pod/fleet-controller-78b7d7d9cf-rddlw   1/1     Running     0          15m
fleet-system    pod/gitjob-6d5565ffb-jthn5              1/1     Running     0          15m
kube-system     pod/coredns-7944c66d8d-bf284            1/1     Running     0          17m

NAMESPACE      NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                  AGE
default        service/kubernetes   ClusterIP   10.43.0.1      <none>        443/TCP                  17m
fleet-system   service/gitjob       ClusterIP   10.43.229.23   <none>        80/TCP                   15m
kube-system    service/kube-dns     ClusterIP   10.43.0.10     <none>        53/UDP,53/TCP,9153/TCP   17m

NAMESPACE      NAME                               READY   UP-TO-DATE   AVAILABLE   AGE
fleet-system   deployment.apps/fleet-agent        1/1     1            1           14m
fleet-system   deployment.apps/fleet-controller   1/1     1            1           15m
fleet-system   deployment.apps/gitjob             1/1     1            1           15m
kube-system    deployment.apps/coredns            1/1     1            1           17m

NAMESPACE      NAME                                          DESIRED   CURRENT   READY   AGE
fleet-system   replicaset.apps/fleet-agent-5f8bc46697        0         0         0       14m
fleet-system   replicaset.apps/fleet-agent-66c54576c6        1         1         1       12m
fleet-system   replicaset.apps/fleet-controller-78b7d7d9cf   1         1         1       15m
fleet-system   replicaset.apps/gitjob-6d5565ffb              1         1         1       15m
kube-system    replicaset.apps/coredns-7944c66d8d            1         1         1       17m
root@441df534ab82:/var/lib/rancher#
#sysbox-container: 
docker run --privileged -p 80:80 -p 443:443  --device /dev/kmsg:/dev/kmsg -it rancher/rancher 

#host machine:
curl 192.168.11.101 # the IP of  host machine 
#curl: (7) Failed to connect to 192.168.11.101 port 80: Connection refused

#docker:dind -container: 
docker run --privileged -p 80:80 -p 443:443  -it rancher/rancher 

#host machine:
curl 192.168.11.101 # the IP of  host machine 
#<a href="https://192.168.11.101/">Found</a>.

There is a network issue on it.

I am trying to deploy Rancher in single node, build clusters environments for dev server,testing server and so on, and finally deploy it to production(single-node or multi-node). The docker:dind is suggested for test environment only, so I want to use Sysbox, and deploy it to production.

My steps:

  1. Build sysbox-container as master, run Rancher in it, Rancher will build a default cluster.
  2. Build sysbox-container as node, run Rancher-Agent in it and build a new cluster.
  3. Repeat the 2th step, and build more clusters.

Can you offer me a workaround?

We also would love to have support for this. Is there any progress on this issue?

Also we found that rancher image fails to extract on docker running inside sysbox container:

inside sysbox container > docker run --privileged --device /dev/kmsg:/dev/kmsg -it rancher/rancher:v2.6-head
Unable to find image 'rancher/rancher:v2.6-head' locally
v2.6-head: Pulling from rancher/rancher
fa7b56d5c338: Pull complete
831c06a19f1c: Pull complete
9b07d273a2f4: Pull complete
5d7ac9c67454: Pull complete
4fede13eeff9: Pull complete
1ead93fe9b8f: Pull complete
06d4f82e466f: Pull complete
c545b5ac0e22: Pull complete
32c21992ee2f: Pull complete
b00b142b2e37: Pull complete
1a688fd93915: Pull complete
da7f4e0805f8: Extracting [==================================================>]   10.4MB/10.4MB
b860e69a3c7f: Download complete
3fb82d2cef36: Download complete
2810732276c3: Download complete
da305018a6f1: Download complete
8ccee01d29a6: Download complete
209521d1a443: Download complete
a14c7eef703a: Download complete
58be7072bca2: Download complete
docker: failed to register layer: ApplyLayer exit status 1 stdout:  stderr: lchown /usr/bin/etcdctl: invalid argument.
See 'docker run --help'.

It pulls and stars rancher with dind setup.

Some context, we're running sysbox 0.5.x setup. All files are accessible from within /var/lib/docker and all belong to normal users:

inside sysbox container > sudo find /var/lib/docker | xargs -n 128 sudo ls -la | grep nobody | less
# empty
inside sysbox container > sudo find /var/lib/docker -user 65534
# empty

@aisbaa, I don't remember seeing this error in this context (i just reproduced) so it must be something new that we will need to look into.

Having said that, what's the use-case that you have in mind? Do you need the rancher-server to operate within a Sysbox container, or would it suffice to have any of its components (e.g., k3s, rke, rke2)? I'm asking coz the latter ones should work fine.

Having said that, what's the use-case that you have in mind?

We're using kubernetes pods as development environments for our engineer, we call those devpods. Currently we're using k3d as development environment for kuberentes.

Do you need the rancher-server to operate within a Sysbox container, or would it suffice to have any of its components (e.g., k3s, rke, rke2)? I'm asking coz the latter ones should work fine.

The end goal is to find working configuration for k3d or other tool that can run kuberentes inside docker. I tried running default k3d configuration and it did fail due to open /dev/kmsg: no such file or directory:

devpod> k3d version
k3d version v5.4.3
k3s version v1.23.6-k3s1 (default)

devpod> k3d cluster create mycluster
...

devpod> docker ps -a
CONTAINER ID   IMAGE                            COMMAND                  CREATED          STATUS                          PORTS                             NAMES
0c8675d3f3c9   ghcr.io/k3d-io/k3d-proxy:5.4.3   "/bin/sh -c nginx-pr…"   12 minutes ago   Up 12 minutes                   80/tcp, 0.0.0.0:45659->6443/tcp   k3d-mycluster-serverlb
422fc2cf02ab   rancher/k3s:v1.23.6-k3s1         "/bin/k3s server --t…"   12 minutes ago   Restarting (1) 12 seconds ago                                     k3d-mycluster-server-0

devpod> docker logs -f k3d-mycluster-server-0 2>&1 | tail
...
I0808 18:30:10.230478      32 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
E0808 18:30:10.230535      32 kubelet.go:496] "Failed to create an oomWatcher (running in UserNS, Hint: enable KubeletInUserNamespace feature flag to ignore the error)" err="open /dev/kmsg: no such file or directory"
E0808 18:30:10.230556      32 server.go:298] "Failed to run kubelet" err="failed to run Kubelet: failed to create kubelet: open /dev/kmsg: no such file or directory"
E0808 18:30:10.230855      32 node.go:152] Failed to retrieve node info: nodes "k3d-mycluster-server-0" not found

Having said that I've noticed that sysbox should support k0s, which I don't recall if we evaluated. So we might be able to swap k3d with k0s.

P.S. Sorry conflating docker pull issue with /dev/kmsg.
P.S.S. I've only tried Community Edition of sysbox.

@aisbaa, yes, we're aware of this issue with k3d. You're not the first one asking (link), so we'll try to have a fix for that asap.