getumbrel/umbrel

`docker exec` seems to be broken on UmbrelOS 1.0 (RPi)

AmitAronovitch opened this issue ยท 3 comments

  • Cannot start any interactive process/shell inside running containers using docker exec -it {container} {args} (fails due to missing device).
  • Cannot start any executable from the image filesystem inside running containers (even not interactively) using docker exec {container} {command} (fails with "command not found" or "no such file or directory").

How to repeat:

sudo docker run -d --rm --name docker_test busybox sleep 1000
sudo docker exec -it docker_test sh

Expected result (on other linux distros, including Umbrel 0.5.x):

Interactive busybox shell in the container...

Result on Umbrel 1.0:

OCI runtime exec failed: exec failed: unable to start container process: open /dev/ptmx: no such device: unknown

System details:

  • Device: Raspberry Pi 4
  • umbrelOS 1.0.4 (recently upgraded from 0.5.x)
$ uname -a
Linux umbrel 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux

Implications:

Cannot use the terminal to run cli commands of the installed application (e.g. lncli for the case of lightning node).

My analysis (will post details below):

It seems like the processes that are generated by docker exec are "escaped" from the chroot jail ("/" is mounted to the system base root, rather than the container's).

Adding details of my diagnosis, for further information and to help debugging:

Setup:

# start a container and exec a new process in it
sudo docker run -d --rm --name docker_test busybox sleep 1000
sudo docker exec -d docker_test sleep 1000

# find out the pid of the main process and the subprocess
main_pid=$(sudo docker inspect docker_test | jq -r '.[] | .State.Pid')
main_ppid=$(sudo cat /proc/${main_pid}/status | grep PPid | awk '{print $2}')
sub_pid=$(pgrep -P $main_ppid  | grep -v ${main_pid})

Now, look at the root mounts of the processes:

$ sudo cat /proc/${main_pid}/mounts | head -1 
overlay / overlay rw,relatime,lowerdir=/var/lib/docker/ ... ... /work 0 0

( truncated long output... Root is mounted to the container's overlay, with all the layers listed)

$ sudo cat /proc/${sub_pid}/mounts | head -1
/dev/root / ext4 ro,relatime 0 0

For the subprocess, root is mounted as ext4 (a physical disk partition?). This happens for me only on umbrel 1.0. Repeating the same commands on other linux distros, I get the same result as I got above for the main pain process above (root mounted to the container's overlay).

Note that the container's layer fs IS in fact mounted, but not at the root:

$ layerhash=$(sudo cat /proc/${main_pid}/mounts | head -1 | grep -o 'upperdir=.*/diff' | awk -F/ '{print $(NF-1)}')
$ sudo cat /proc/${lower_pid}/mounts | grep $layerhash
overlay /run/rugpi/mounts/data/overlay/root overlay rw,relatime,lowerdir=/ ... ... /work 0 0

Looks like the same overlay that is mounted as root in the main process is now mounted at /run/rugpi/mounts/data/overlay/root, but not chrooted.

This fact opens the way for a workaround that I can use until this is solved (I posted it in the community website as a way to use the LND cli on my Rpi 4 Umbrel 1.0 :
https://community.umbrel.com/t/why-is-docker-compose-not-working-anymore-with-umbrel-1-0/15756/32 )

Heya @AmitAronovitch thanks so much for taking the time to dig into this and post your findings, really appreciate it!

I can confirm the issue doesn't occur on our amd64 builds for Umbrel Home, it looks like it's to do with the build system we use for Raspberry Pi builds and the way it mounts the filesystem as you pointed out.

Working on a fix for the next umbrelOS release, thanks again! ๐Ÿ’œ

I can confirm that this is solved in UmbrelOS 1.1 ๐Ÿ‘

Upgrading to rugpi v0.6.5 did the job.
Seems like it was fixed in rugpi, by using pivot_root instead of chroot (like docker does).

Thanks @lukechilds and thanks @koehlma ๐Ÿ˜„