google/nsjail

sched_setaffinity(max_cpus=1) failed: Invalid argument

wizeman opened this issue · 2 comments

When running nsjail with --max_cpus 1, about 50% of the time nsjail fails with the following errors:

[D][2022-08-02T10:30:54+0200][1] void cpu::setRandomCpu(cpu_set_t*, size_t, size_t)():45 Setting allowed CPU#:3 of [0-3]
[W][2022-08-02T10:30:54+0200][1] bool cpu::initCpu(nsjconf_t*)():85 sched_setaffinity(max_cpus=1) failed: Invalid argument
[F][2022-08-02T10:30:54+0200][1] bool subproc::runChild(nsjconf_t*, int, int, int, int)():448 Launching child process failed
[W][2022-08-02T10:30:54+0200][1029523] bool subproc::runChild(nsjconf_t*, int, int, int, int)():478 Received error message from the child process before it has been executed
[E][2022-08-02T10:30:54+0200][1029523] int nsjail::standaloneMode(nsjconf_t*)():256 Couldn't launch the child process
[D][2022-08-02T10:30:54+0200][1029523] int main(int, char**)():343 Returning with 255

This only started happening after SMT / hyperthreading became disabled (due to retbleed=auto,nosmt kernel param).

It should be noted that this machine has 4 cores / 8 threads, but when SMT is disabled, the Linux kernel sees processors 0, 2, 4 and 6 to be online (and therefore 1, 3, 5 and 7 to be offline), according to /proc/cpuinfo.

can you grep Cpus /proc/self/status.

I wonder if the online status is reflected in the current cpu affinity set.

Sure, here it goes:

$ grep Cpus /proc/self/status
Cpus_allowed:   0055
Cpus_allowed_list:      0,2,4,6

The result seems to be consistent, i.e. it doesn't change among different invocations.