SUSE/cpuset

cset error :failed to create shield, hint: do other cpusets exist?

Opened this issue · 14 comments

Hello, i got an error when I try to use shield on Ubuntu.
Maybe the same as #26 ?
Tested with the current and the 1.6 version.

root@host:~# cset shield -c 0-1
cset: --> failed to create shield, hint: do other cpusets exist?
cset: **> [Errno 22] Invalid argument

root@host:~# cset --version
cset: Cpuset (cset) 1.5.6

root@host:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.4 LTS
Release:        18.04
Codename:       bionic

Same with manually install the latest package on 19.10

root@host2:~# cset shield -c 0-1
cset: --> failed to create shield, hint: do other cpusets exist?
cset: **> [Errno 22] Invalid argument

root@red-desktop:~# cset --version
cset: Cpuset (cset) 1.6

root@red-desktop:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 19.10
Release:        19.10
Codename:       eoan

Thanks.

I have the same error failed to create shield, hint: do other cpusets exist? on debian buster with cset version 1.5.6.

cset set shows that i have an additional set for docker, I wonder if this is the issue / am not sure how to select the right cpuset.

cset: 
         Name       CPUs-X    MEMs-X Tasks Subs Path
 ------------ ---------- - ------- - ----- ---- ----------
         root        0-7 y       0 y  1550    2 /
       docker        0-7 n       0 n     0    9 /docker
 machine.slice        0-7 n       0 n     0    0 /machine.slice

It looks like there are some assumptions about existing cpusets, manually creating a set with cset set -s user -c 0-7 then invoking shield with cset shield --sysset=root --userset=user -k on -c 6-7 to explicitly specify the appropriate sets gets a step further but still errors out:

CSET_DEBUG_LEVEL=10 cset shield --sysset=root --userset=user -k on -c 6-7
cset: **> [Errno 13] Permission denied
cset: insufficient permissions, you probably need to be root
Traceback (most recent call last):
  File "/usr/bin/cset", line 47, in <module>
    main()
  File "/usr/lib/python2.7/dist-packages/cpuset/main.py", line 228, in main
    command.func(parser, options, args)
  File "/usr/lib/python2.7/dist-packages/cpuset/commands/shield.py", line 262, in func
    make_shield(options.cpu, options.kthread)
  File "/usr/lib/python2.7/dist-packages/cpuset/commands/shield.py", line 395, in make_shield
    set.modify(SYS_SET, cpuspec_inv, memspec, False, False)
  File "/usr/lib/python2.7/dist-packages/cpuset/commands/set.py", line 411, in modify
    if cpuspec: nset.cpus = cpuspec
  File "/usr/lib/python2.7/dist-packages/cpuset/cset.py", line 186, in setcpus
    f.close()
IOError: [Errno 13] Permission denied

Removing the docker cset so there's nothing there doesn't seem to help either.

(Also you can enable debug outputs with CSET_DEBUG_LEVEL=10)

Manually creating and moving processes between sets still works, for example:

  • sudo cset set -s system -c 0-1 -m 0 to create a system cpuset with cores 1-0
  • sudo cset set -s user -c 2-7 -m 0 to create a user cpuset with cores 2-7
  • sudo cset proc -m -s root -t system -k to move tasks from / to /system cpuset

I cannot however alter the docker cpuset

➜  ~ cset set -l                           
cset: 
         Name       CPUs-X    MEMs-X Tasks Subs Path
 ------------ ---------- - ------- - ----- ---- ----------
         root        0-7 y       0 y   136    3 /
         vfio        2-7 n       0 n     0    0 /vfio
       docker        0-7 n       0 n     0    8 /docker
       system        0-1 n       0 n  1556    0 /system

➜  ~ sudo cset set -s docker -c 0-1
cset: **> [Errno 16] Device or resource busy

Applying this patch (without the python2 part) seems to resolve the issue, at least for me:
https://rokups.github.io/#!pages/gaming-vm-performance.md#Update_1:_cpuset_patch

Iiiinteresting, I'll have to try this patch.

It's not the root issue but in case it's useful for others, to control the docker cset you need to change the cgroup driver to systemd with something like "exec-opts": ["native.cgroupdriver=systemd"] in /etc/docker/daemon.json.

@ryankurte Report to me if this patch works for your case, in that case I can open a PR.

Posting a cleaned version of the patch here if someone wants to try it: cpuset.txt

@thiagokokada The patch fixed the error that I had. Although I am curious why commenting two lines fixed the whole problem? I am interested to know more about this

@thiagokokada The patch fixed the error that I had. Although I am curious why commenting two lines fixed the whole problem? I am interested to know more about this

I really don't understand too much about the code to know why this fixes the problem, but AFAIK this is probably skipping some guards.

My use case is to isolate CPUs for a VM in libvirt. Without this patch libvirt (that AFAIK uses cpuset syscall, not this program, internally) can't migrate the vCPUs to the isolated CPUs, but with this patch it works fine.

It definitely works fine for me, because there is no user space or kernel threads running in the VM (this VM is highly sensitive to latency so any thread stealing CPU cycles results in huge spikes in latency).

So I think this is the options that commenting those lines disable: https://github.com/lpechacek/cpuset/blob/master/cpuset/commands/set.py#L166-L171. They're not exposed in shield though, so I think shield set both of those options to true unconditionally.

Maybe a patch exposing those options to shield should do the trick.

So we have those calls in make_shield() function: https://github.com/lpechacek/cpuset/blob/master/cpuset/commands/shield.py#L379-L380

That basically is this function: https://github.com/lpechacek/cpuset/blob/master/cpuset/commands/set.py#L383-L400

So yeah, it basically uses modify() with CPU exclusive set (but not Memory like I said before). A better patch would be to simply set False in those two lines: https://github.com/lpechacek/cpuset/blob/master/cpuset/commands/shield.py#L379-L380.

A slightly better patch: cpuset2.txt

Dropping the CPU exclusivity of the cpuset won't keep the CPUs "shielded" as intended. I don't think this is an approach that would be broadly acceptable.

@thiagokokada Do you also have a cpuset cgroup on your system that intersects the shielded CPUs (as in the original report)? What happens if you exclude the offending CPUs from that cgroup?

cout commented

I also had this error:

$ sudo cset shield --cpu 2
cset: --> failed to create shield, hint: do other cpusets exist?
cset: **> [Errno 22] Invalid argument

I see I have no docker containers running:

$ sudo docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

But I do have a cpu set for docker:

$ sudo cset set -l
cset: 
         Name       CPUs-X    MEMs-X Tasks Subs Path
 ------------ ---------- - ------- - ----- ---- ----------
         root       0-63 y     0-1 y  4117    1 /
       docker 0,2,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62 n     0-1 n     0    0 ...

But on another machine, where cset shield works just fine, I have:

$ cset set -l
cset: 
         Name       CPUs-X    MEMs-X Tasks Subs Path
 ------------ ---------- - ------- - ----- ---- ----------
         root        0-7 y       0 y   155    3 /
         user          2 y       0 n     5    0 /user
       docker      ***** n   ***** n     0    0 /docker
       system    0-1,3-7 y       0 n   955    0 /system

So on the first machine I tried removing the cpus assigned to docker:

root# echo > /sys/fs/cgroup/cpuset/docker/cpuset.cpus

Then the list looks like this:

$ cset set -l
cset: 
         Name       CPUs-X    MEMs-X Tasks Subs Path
 ------------ ---------- - ------- - ----- ---- ----------
         root       0-63 y     0-1 y  4112    1 /
       docker      ***** n     0-1 n     0    0 /docker

And I am able to create a shield with no errors:

$ sudo cset shield --cpu 2                        
cset: --> activating shielding:
cset: moving 3370 tasks from root into system cpuset...
[==================================================]%
cset: "system" cpuset of CPUSPEC(0-1,3-63) with 3370 tasks running
cset: "user" cpuset of CPUSPEC(2) with 0 tasks running