Disable creation of cgroups in jail
Dragoncraft89 opened this issue · 1 comments
Dragoncraft89 commented
If /sys/fs/cgroup
is mounted inside the jail, the application can create new cgroups. Even though they cannot be entered, this could lead to resource exhaustion on the kernel's cgroups.
Example:
$ ./nsjail --chroot / -- /bin/bash
[I][2022-07-29T14:15:12+0200] Mode: STANDALONE_ONCE
[I][2022-07-29T14:15:12+0200] Jail parameters: hostname:'NSJAIL', chroot:'/', process:'/bin/bash', bind:[::]:0, max_conns:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:true, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clone_newuts:true, clone_newcgroup:true, clone_newtime:false, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[I][2022-07-29T14:15:12+0200] Mount: '/' -> '/' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2022-07-29T14:15:12+0200] Mount: '/proc' flags:MS_RDONLY type:'proc' options:'' dir:true
[I][2022-07-29T14:15:12+0200] Uid map: inside_uid:1000 outside_uid:1000 count:1 newuidmap:false
[I][2022-07-29T14:15:12+0200] Gid map: inside_gid:1000 outside_gid:1000 count:1 newgidmap:false
[I][2022-07-29T14:15:12+0200] Executing '/bin/bash' for '[STANDALONE MODE]'
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
$ mkdir /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/mycgroup
$ ls /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/mycgroup
cgroup.controllers cgroup.max.descendants cgroup.type memory.events memory.min memory.swap.current pids.events
cgroup.events cgroup.procs cpu.pressure memory.events.local memory.numa_stat memory.swap.events pids.max
cgroup.freeze cgroup.stat cpu.stat memory.high memory.oom.group memory.swap.high
cgroup.kill cgroup.subtree_control io.pressure memory.low memory.pressure memory.swap.max
cgroup.max.depth cgroup.threads memory.current memory.max memory.stat pids.current
$ # However, cannot be entered
$ echo $$ > /sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/mycgroup/cgroup.procs
bash: echo: write error: No such file or directory
My current workaround would be not to mount /sys/fs/cgroup
into the jail, but maybe this could be prevented by default?
okunz commented
If you drop the --chroot /
this should not be possible anymore. The issue is that the way you configure nsjail
exposes the entire root directory in the sandbox with the same UID/GID configuration.