iximiuz/cdebug

chroot can't execute 'sh': No such file or directory

rejoshed opened this issue ยท 18 comments

Hey, have a crossplane provider-aws pod running and I'm trying to get into the darn thing. Unfortunately even using this program I still get

chroot can't execute 'sh': No such file or directory

The image comes from here: https://github.com/crossplane-contrib/provider-aws

The image seems to be from scratch, but I can't really follow their build process. It's quite complicated.

In this case I've ssh'd to the Node the pod is on and attempted to run your program, but with no luck. ๐Ÿ˜ข

Thanks for this and your article btw. It's all quite amazing! I feel like you're my sprit animal. I'm often finding you've already trekked a similar path and blogged it when I'm attempting something.

Holy cow!

https://github.com/Mic92/cntr did work!

Thank you for building your program, but also super thank you for documenting alternatives!

Hi @rejoshed! Thanks for the report (and for the kind words)! ๐Ÿ˜„

Could you maybe share the complete cdebug command that you used? And also, having the output of kubectl get pods <crossplane-provider-aws> -o yaml would be helpful. I suspect it might be running with a read-only rootfs.

Thanks!

Hey, I think that was at least one of the issues. The deployment by crossplane is very locked down.

I'm sorry I haven't had much of a chance to check back into this just yet, but hopefully soon. : )

No rush at all, take your time!

+1 with the same symptoms.

ONLY with containers from images like this:

[root@ip-192-168-1-11 cdebug-0.0.8]# docker history 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5
IMAGE          CREATED         CREATED BY                                    SIZE      COMMENT
4643bd56bc8c   20 months ago   ENTRYPOINT ["/pause"]                         0B        buildkit.dockerfile.v0
<missing>      20 months ago   USER 65535:65535                              0B        buildkit.dockerfile.v0
<missing>      20 months ago   ADD bin/pause-linux-arm64 /pause # buildkit   484kB     buildkit.dockerfile.v0
<missing>      20 months ago   ARG ARCH                                      0B        buildkit.dockerfile.v0

[root@ip-192-168-1-11 cdebug-0.0.8]# cat /etc/os-release 
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"

Strange thing older version has same symptoms. Even switch from ARM64 to AMD64 changes nothing.

[root@ip-192-168-2-249 cdebug-0.0.4]# ./cdebug exec -it --privileged 1e97d3ebd3e1
Pulling debugger image...
latest: Pulling from library/busybox
Digest: sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c
Status: Image is up to date for busybox:latest
chroot: can't execute 'sh': No such file or directory

All versions are buit from sources.

Other containers(containing shell BTW) are fine, but the idea is to debug "no-shellies" :)

Have you folks tried https://github.com/iximiuz/cdebug/releases/tag/v0.0.9? There were two fixes related to this problem.

Nope. Same thing.

[root@ip-192-168-1-239 cdebug-0.0.9]# ./cdebug exec -it --privileged 1d65808c0b32
Pulling debugger image...
latest: Pulling from library/busybox
205dae5015e7: Pull complete 
Digest: sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c
Status: Downloaded newer image for busybox:latest
chroot: can't execute 'sh': No such file or directory

[root@ip-192-168-1-239 cdebug-0.0.9]# docker ps | grep 1d65808c0b32
1d65808c0b32   602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.5                   "/pause"                 39 minutes ago       Up 39 minutes                 k8s_POD_aws-node-termination-handler-vkcdm_kube-system_451853dd-01fc-4035-8c11-71953a07e83a_0

@kappa8219 do you know what's the architecture of that pause image? Based on the busybox digest from the snippet above, you're on an arm64 machine. But can it be that this 1d65808c0b32 container is amd64?

@kappa8219 do you know what's the architecture of that pause image? Based on the busybox digest from the snippet above, you're on an arm64 machine. But can it be that this 1d65808c0b32 container is amd64?

It is AMD64, first was ARM indeed. But both hosts are the functioning k8s nodes. So there should not be arch mismatch.

Just in case, could you try running cdebug exec -it --privileged --platform <platform> <target> where the platform is either arm64 or amd64, and it matches the platform of the target container?

Just in case, could you try running cdebug exec -it --privileged --platform <platform> <target> where the platform is either arm64 or amd64, and it matches the platform of the target container?

[root@ip-192-168-1-91 cdebug-main]# ./cdebug exec -it --privileged --platform amd64 b5bafa5ed1a3
Pulling debugger image...
latest: Pulling from library/busybox
Digest: sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c
Status: Image is up to date for busybox:latest
chroot: can't execute 'sh': No such file or directory
[root@ip-192-168-1-91 cdebug-main]# uname -m
x86_64

Hope I don't bother you much cause these pause-containers are just a dummy processes in a POD. But it is interesting what is the cause of such errors.

[root@ip-192-168-1-91 cdebug-main]# docker ps | grep dkgc4
8be8b1a5588f   quay.io/prometheus/node-exporter                                        "/bin/node_exporter โ€ฆ"   9 months ago    Up 9 months              k8s_node-exporter_monitoring-node-exporter-prometheus-node-exporter-dkgc4_monitoring_8231267f-e869-4073-808d-160b20055636_0
b5bafa5ed1a3   602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/pause:3.1-eksbuild.1   "/pause"                 9 months ago    Up 9 months              k8s_POD_monitoring-node-exporter-prometheus-node-exporter-dkgc4_monitoring_8231267f-e869-4073-808d-160b20055636_0

Second, "workload" container(no-shellie also) is absolutely fine with cdebug:

[root@ip-192-168-1-91 cdebug-main]# ./cdebug exec -it --privileged --platform amd64 8be8b1a5588f
Pulling debugger image...
latest: Pulling from library/busybox
Digest: sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c
Status: Image is up to date for busybox:latest
/ # 
/ #  

docker exec -it 8be8b1a5588f bash
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "bash": executable file not found in $PATH: unknown

So this issue is more like "detective story" not a real usecase, IMHO.

What I did - extracted this pause "guts". Just for brainstorming %)

{
  "architecture": "amd64",
  "config": {
    "Hostname": "",
    "Domainname": "",
    "User": "",
    "AttachStdin": false,
    "AttachStdout": false,
    "AttachStderr": false,
    "Tty": false,
    "OpenStdin": false,
    "StdinOnce": false,
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": null,
    "Image": "sha256:7608ec41f6dcdfcfa3a0e625fc14976fcd6257f58baa48288609c413e9aecde4",
    "Volumes": null,
    "WorkingDir": "",
    "Entrypoint": [
      "/pause"
    ],
    "OnBuild": null,
    "Labels": null
  },
  "container": "1dfe01174b90ef08a62f969ac3e111bde5eba064e61d0050e688e7ff4723252b",
  "container_config": {
    "Hostname": "1dfe01174b90",
    "Domainname": "",
    "User": "",
    "AttachStdin": false,
    "AttachStdout": false,
    "AttachStderr": false,
    "Tty": false,
    "OpenStdin": false,
    "StdinOnce": false,
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Cmd": [
      "/bin/sh",
      "-c",
      "#(nop) ",
      "ENTRYPOINT [\"/pause\"]"
    ],
    "Image": "sha256:7608ec41f6dcdfcfa3a0e625fc14976fcd6257f58baa48288609c413e9aecde4",
    "Volumes": null,
    "WorkingDir": "",
    "Entrypoint": [
      "/pause"
    ],
    "OnBuild": null,
    "Labels": {}
  },
  "created": "2020-07-10T18:19:35.63107296Z",
  "docker_version": "19.03.11",
  "history": [
    {
      "created": "2020-07-10T18:19:35.328927755Z",
      "created_by": "/bin/sh -c #(nop)  ARG ARCH",
      "empty_layer": true
    },
    {
      "created": "2020-07-10T18:19:35.500888129Z",
      "created_by": "/bin/sh -c #(nop) ADD file:c9c01c3ad66a142eb9c65a6cea2ae4e039fb3f0f0d1dace44ebc4921e5b8194f in /pause "
    },
    {
      "created": "2020-07-10T18:19:35.63107296Z",
      "created_by": "/bin/sh -c #(nop)  ENTRYPOINT [\"/pause\"]",
      "empty_layer": true
    }
  ],
  "os": "linux",
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:4548548ec9f2c5096713a22d782fb56f1f6f523072a4b323b1963171d09c1b4d"
    ]
  }
}

Thanks for the detailed report! I haven't had the time to analyze it thoroughly, but here is a new release that fixes another related issued v0.0.11. Could you give it a try?

Also, could you try the --image nixery.dev/shell flag (combining it with the --platform flag for more certainty)?

Now some other error:
exec /bin/sh: exec format error

[root@ip-192-168-1-192 cdebug-main]# ./cdebug exec -it --privileged --platform arm64 --image nixery.dev/shell bff11ad6cc7e
Pulling debugger image...
latest: Pulling from shell
Digest: sha256:98eb4fbb8ee9659cd686b2f3f32513f3b4a0d95a006b8e178b8d2a67b3f6bc08
Status: Image is up to date for nixery.dev/shell:latest
WARNING: image with reference nixery.dev/shell was found but does not match the specified platform: wanted linux/arm64, actual: linux/amd64
exec /bin/sh: exec format error

UPD:
It seems that nixery shell is not multiarch, just x64.

UPDATE:

It was an arm node and now I tried AMD64. And it works!

[root@ip-192-168-3-201 cdebug-main]# ./cdebug exec -it --privileged --platform amd64 --image nixery.dev/shell 7f2fa2da9380 Pulling debugger image... latest: Pulling from shell Digest: sha256:98eb4fbb8ee9659cd686b2f3f32513f3b4a0d95a006b8e178b8d2a67b3f6bc08 Status: Image is up to date for nixery.dev/shell:latest bash-5.2#

"Pause" hacked! Hurray! Thanks.

Hi Oleksii! Sorry for the late reply, busy times ๐Ÿ™ˆ I'm on vacation next week, so hopefully I'll have more time to dig into it. Thanks for all your detailed reports!

It took me exactly one year, but there is a good chance that the recent 0.0.17 release fixes this issue ๐Ÿ™ˆ If someone's in this thread still interested, please give it a try.