containers/youki

Support for rootless container

utam0k opened this issue ยท 23 comments

I don't know anything about rootless container yet. I'd like to use this issue to gather references and think about the design. I'm also looking for people to challenge this.

References

I am interested in looking into this.

@Furisto
I will assign you to this issue.
However, I don't know how big this issue will be. Where would you like to start? You can start with the implementation, but I don't know yet if youki have what it takes to implement it.

@utam0k
We will need to delegate the creation of the cgroups to systemd. That effort is luckily already underway. We also need to use user namespaces and uid and guidmaps to map the root user to a non root user on the host. I need to read up on the details.

@Furisto
This is #46, isn't it?

We will need to delegate the creation of the cgroups to systemd. That effort is luckily already underway.
I'm willing to start where you can start implementing outside of this PR, but do you want to start a challenge.

@utam0k Yes, correct.

@Furisto
I'm willing to start where you can start implementing outside of this PR. Do you want to start a challenge?

@utam0k Yeah, I am planning to work on the necessary adaptations for the create command this weekend.

Update: I can now create a container without being root and map root inside the container to a non root user on the host. I am still investigating problems with bind mounts and getting a shell into the container.

I'm following the README, this command will trigger an error with the message:

$ curl https://gist.githubusercontent.com/utam0k/8ab419996633066eaf53ac9c66d962e7/raw/e81548f591f26ec03d85ce38b0443144573b4cf6/config.json -o config.json
$ cd ../
$ ./youki create -b tutorial tutorial_container
$ ./youki state tutorial_container # You can see the state the container is in as it is being generate.
$ ./youki start tutorial_container
$ ./youki state tutorial_container # Run it within 5 seconds to see the running container.
$ ./youki delete tutorial_container # Run it after the container is finished running.
yukang@mango:~/youki$ ./youki create -b demo demo_container
[DEBUG src/container/init_builder.rs:90] 2021-08-23T13:59:59.138549106+00:00 container directory will be "/run/user/1001/demo_container"
[DEBUG src/container/container.rs:94] 2021-08-23T13:59:59.189015677+00:00 Save container status: Container { state: State { oci_version: "v1.0.2", id: "demo_container", status: Creating, pid: None, bundle: "demo", annotations: Some({}), created: None, creator: None, use_systemd: None }, root: "/run/user/1001/demo_container" } in "/run/user/1001/demo_container"
[DEBUG src/rootless.rs:34] 2021-08-23T13:59:59.189315080+00:00 rootless container should be created
[WARN src/rootless.rs:35] 2021-08-23T13:59:59.189352980+00:00 resource constraints and multi id mapping is unimplemented for rootless containers
Error: rootless containers require gid_mappings in spec

Do we need to update the config.json for the rootless container?

@chenyukang Yes, the current spec does not work for rootless containers. Are you interested in fixing it?

I'm interested in fixing it.
But I'm still trying to understand the code since this is my first try at it. :)

I think runc has runc spec --rootless. Maybe that's what we should add. @chenyukang Would you like to add it? If so, just create a new issue and ping us if you need any help.

Thanks @yihuaf , I use runc spec --rootless to generate a config.json, we need to add this to config.json:

  "uidMappings": [
      {
        "containerID": 0,
        "hostID": 1001,
        "size": 1
      }
    ],
    "gidMappings": [
      {
        "containerID": 0,
        "hostID": 1001,
        "size": 1
      }
    ],

Should we add a demo config.json to repo, maybe add a directory youki_integration_test/config/,...

Currently, we use a gist in README.md, or do I create a gist and fix readme?

Thanks @yihuaf , I use runc spec --rootless to generate a config.json, we need to add this to config.json:

  "uidMappings": [
      {
        "containerID": 0,
        "hostID": 1001,
        "size": 1
      }
    ],
    "gidMappings": [
      {
        "containerID": 0,
        "hostID": 1001,
        "size": 1
      }
    ],

Should we add a demo config.json to repo, maybe add a directory youki_integration_test/config/,...

Currently, we use a gist in README.md, or do I create a gist and fix readme?

When I made this tutorial, there was no spec command yet, so I used gist. Now that the spec command available, I think it would be a good idea to create a tutorial based on it.

@chenyukang Yes, we need to add uid, gid mapping and user namespace to the spec. And I agree we should change the readme tutorial to be use youki spec and youki spec --rootless. I'd rather not copy around snippets if we can avoid it. Just a heads up, there are some cases of rootless Youki is not working quite right yet.

yes, I will fix the readme with youki spec.
youki spec --rootless is not available right now. We need to add it.
After that add another rootless demo in the tutorial.

yes, I will fix the readme with youki spec.
youki spec --rootless is not available right now. We need to add it.
After that add another rootless demo in the tutorial.

@chenyukang
This is a very wonderful initiative. If you'd like, could you create an issue with this?

currently, I get an permission error.

This config.json is generated by : containers/oci-spec-rs#49

{
  "ociVersion": "1.0.2-dev",
  "root": {
    "path": "rootfs",
    "readonly": true
  },
  "mounts": [
    {
      "destination": "/proc",
      "type": "proc",
      "source": "proc",
      "options": []
    },
    {
      "destination": "/dev",
      "type": "tmpfs",
      "source": "tmpfs",
      "options": [
        "nosuid",
        "strictatime",
        "mode=755",
        "size=65536k"
      ]
    },
    {
      "destination": "/dev/pts",
      "type": "devpts",
      "source": "devpts",
      "options": [
        "nosuid",
        "noexec",
        "newinstance",
        "ptmxmode=0666",
        "mode=0620"
      ]
    },
    {
      "destination": "/dev/shm",
      "type": "tmpfs",
      "source": "shm",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "mode=1777",
        "size=65536k"
      ]
    },
    {
      "destination": "/dev/mqueue",
      "type": "mqueue",
      "source": "mqueue",
      "options": [
        "nosuid",
        "noexec",
        "nodev"
      ]
    },
    {
      "destination": "/sys",
      "type": "none",
      "source": "/sys",
      "options": [
        "rbind",
        "nosuid",
        "noexec",
        "nodev",
        "ro"
      ]
    },
    {
      "destination": "/sys/fs/cgroup",
      "type": "cgroup",
      "source": "cgroup",
      "options": [
        "nosuid",
        "noexec",
        "nodev",
        "relatime",
        "ro"
      ]
    }
  ],
  "process": {
    "terminal": true,
    "user": {
      "uid": 0,
      "gid": 0
    },
    "args": [
      "sh"
    ],
    "env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "TERM=xterm"
    ],
    "cwd": "/",
    "capabilities": {
      "bounding": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "effective": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "inheritable": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "permitted": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ],
      "ambient": [
        "CAP_AUDIT_WRITE",
        "CAP_KILL",
        "CAP_NET_BIND_SERVICE"
      ]
    },
    "rlimits": [
      {
        "type": "RLIMIT_NOFILE",
        "hard": 1024,
        "soft": 1024
      }
    ],
    "noNewPrivileges": true
  },
  "hostname": "youki",
  "annotations": {},
  "linux": {
    "uidMappings": [
      {
        "hostID": 1000,
        "containerID": 0,
        "size": 1
      }
    ],
    "gidMappings": [
      {
        "hostID": 1000,
        "containerID": 0,
        "size": 1
      }
    ],
    "resources": {
      "devices": [
        {
          "allow": false,
          "type": null,
          "major": null,
          "minor": null,
          "access": "rwm"
        }
      ],
      "disableOomKiller": false,
      "oomScoreAdj": null,
      "freezer": null
    },
    "namespaces": [
      {
        "type": "pid"
      },
      {
        "type": "ipc"
      },
      {
        "type": "uts"
      },
      {
        "type": "mount"
      },
      {
        "type": "user"
      }
    ],
    "maskedPaths": [
      "/proc/acpi",
      "/proc/asound",
      "/proc/kcore",
      "/proc/keys",
      "/proc/latency_stats",
      "/proc/timer_list",
      "/proc/timer_stats",
      "/proc/sched_debug",
      "/sys/firmware",
      "/proc/scsi"
    ],
    "readonlyPaths": [
      "/proc/bus",
      "/proc/fs",
      "/proc/irq",
      "/proc/sys",
      "/proc/sysrq-trigger"
    ]
  }
}

Run it with ./youki create -b tutorial/ demo-container will get error:

[DEBUG src/container/init_builder.rs:91] 2021-08-27T16:12:38.650261859+00:00 container directory will be "/run/user/1000/demo8"
[DEBUG src/container/container.rs:94] 2021-08-27T16:12:38.660403546+00:00 Save container status: Container { state: State { oci_version: "v1.0.2", id: "demo8", status: Creating, pid: None, bundle: "tutorial/", annotations: Some({}), created: None, creator: None, use_systemd: None }, root: "/run/user/1000/demo8" } in "/run/user/1000/demo8"
[DEBUG src/rootless.rs:33] 2021-08-27T16:12:38.661537990+00:00 rootless container should be created
[WARN src/rootless.rs:34] 2021-08-27T16:12:38.661877603+00:00 resource constraints and multi id mapping is unimplemented for rootless containers
[INFO cgroups/src/common.rs:174] 2021-08-27T16:12:38.671359365+00:00 cgroup manager V1 will be used
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.672646714+00:00 Get path for subsystem: cpu
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.678170725+00:00 Get path for subsystem: cpuacct
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.687545684+00:00 Get path for subsystem: cpuset
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.697199553+00:00 Get path for subsystem: devices
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.706011389+00:00 Get path for subsystem: hugetlb
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.710140147+00:00 Get path for subsystem: memory
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.717965746+00:00 Get path for subsystem: pids
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.727987229+00:00 Get path for subsystem: perf_event
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.739213458+00:00 Get path for subsystem: blkio
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.747256766+00:00 Get path for subsystem: net_prio
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.756625524+00:00 Get path for subsystem: net_cls
[DEBUG cgroups/src/v1/manager.rs:42] 2021-08-27T16:12:38.774809919+00:00 Get path for subsystem: freezer
[DEBUG src/process/init.rs:195] 2021-08-27T16:12:38.800129386+00:00 creating new user namespace
[DEBUG src/process/channel.rs:39] 2021-08-27T16:12:38.800438598+00:00 send identifier mapping request
[DEBUG src/process/channel.rs:53] 2021-08-27T16:12:38.800539402+00:00 waiting for mapping ack
[DEBUG src/container/builder_impl.rs:92] 2021-08-27T16:12:38.800542702+00:00 write mapping for pid Pid(1910227)
[DEBUG src/process/channel.rs:45] 2021-08-27T16:12:38.800788411+00:00 identifier mapping written
[DEBUG src/rootfs.rs:39] 2021-08-27T16:12:38.804138739+00:00 mount root fs "/home/coder/youki/tutorial/rootfs"
[WARN src/rootfs.rs:54] 2021-08-27T16:12:38.805276783+00:00 A feature of cgroup is unimplemented.
[DEBUG src/process/init.rs:145] 2021-08-27T16:12:38.806197718+00:00 readonly path "/proc/bus" mounted
[DEBUG src/process/init.rs:145] 2021-08-27T16:12:38.806467428+00:00 readonly path "/proc/fs" mounted
[DEBUG src/process/init.rs:145] 2021-08-27T16:12:38.806706838+00:00 readonly path "/proc/irq" mounted
[DEBUG src/process/init.rs:145] 2021-08-27T16:12:38.807065251+00:00 readonly path "/proc/sys" mounted
[DEBUG src/process/init.rs:145] 2021-08-27T16:12:38.807620272+00:00 readonly path "/proc/sysrq-trigger" mounted
[DEBUG src/capabilities.rs:21] 2021-08-27T16:12:38.808037788+00:00 reset all caps
[DEBUG src/capabilities.rs:28] 2021-08-27T16:12:38.808296898+00:00 dropping bounding capabilities to Some([CAP_AUDIT_WRITE, CAP_KILL, CAP_NET_BIND_SERVICE])
[WARN src/syscall/linux.rs:132] 2021-08-27T16:12:38.808660512+00:00 CAP_PERFMON is not supported.
[WARN src/syscall/linux.rs:132] 2021-08-27T16:12:38.808974024+00:00 CAP_CHECKPOINT_RESTORE is not supported.
[WARN src/syscall/linux.rs:132] 2021-08-27T16:12:38.809343238+00:00 CAP_BPF is not supported.
[DEBUG src/process/channel.rs:30] 2021-08-27T16:12:38.813594201+00:00 sending init pid (Pid(-1))
[DEBUG src/process/channel.rs:94] 2021-08-27T16:12:38.814148822+00:00 received child ready message
[DEBUG src/process/channel.rs:30] 2021-08-27T16:12:38.814267526+00:00 sending init pid (Pid(1910228))
[DEBUG src/process/channel.rs:94] 2021-08-27T16:12:38.814920951+00:00 received child ready message
[DEBUG src/container/builder_impl.rs:100] 2021-08-27T16:12:38.815016555+00:00 init pid is Pid(1910228)
Error: Permission denied (os error 13)

@yihuaf , do you know what's going wrong?

./youki --root /tmp/runc run demo also get the same error.
Seems we lack permission to do something, but if we use sudo , youki will not run rootless according to the code of should_use_rootless.

I was trying to fix this issue. Can you now try again with tip of the tree? As long as the spec has user namespace, you should be able to run rootless, sudo or not. There are likely still issue with it, but it should run now.

I was trying to fix this issue. Can you now try again with tip of the tree? As long as the spec has user namespace, you should be able to run rootless, sudo or not. There are likely still issue with it, but it should run now.

I merged code from master, there is still an error, the last one is Failed to add tasks to cgroup manager

[DEBUG src/container/builder_impl.rs:103] 2021-08-27T23:39:39.428075772+00:00 init pid is Pid(2333737)
Error: Failed to add tasks to cgroup manager

Caused by:
    Permission denied (os error 13)