sched-ext/scx

[Bug Report] `scx_lavd` Error: Failed to load BPF program

Closed this issue · 29 comments

Summary

I tried to compile scx_lavd with this particular commit 04c9e7f and ended up with the following errors:

⯁ nyx git:(main) ✗ ❯❯❯ sudo ./result/bin/scx_lavd
Error: Failed to load BPF program

Caused by:
    Invalid argument (os error 22)

Expectation

Expect scx_lavd works as usual.

Additional information

Linux Kernel

⯁ ~ ❯❯❯ uname -ra
Linux nixos-nuc-12 6.8.2-cachyos #1-NixOS SMP PREEMPT_DYNAMIC Tue Mar 26 22:23:34 UTC 2024 x86_64 GNU/Linux

With commit 5bfd90b, scx_lavd was working as expected.

Any workarounds would be appreciated. Thanks.

Do we have any recent changes made to BPF configs?

@miooochi -- Thanks for reporting the bug. Hmm... Could you try sudo ./result/bin/scx_lavd -vvv and share the log? Also, please try the most recent one as well?

@miooochi -- Thanks for reporting the bug. Hmm... Could you try sudo ./result/bin/scx_lavd -vvv and share the log? Also, please try the most recent one as well?

Hi @multics69, sure thing. Logs available in https://fars.ee/pJGZ

Also tested the latest commit 048662a, same results.

Could you provide the output of uname -a?

Could you provide the output of uname -a?

⯁ ~ ❯❯❯ uname -ra
Linux nixos-nuc-12 6.8.2-cachyos #1-NixOS SMP PREEMPT_DYNAMIC Tue Mar 26 22:23:34 UTC 2024 x86_64 GNU/Linux

Yeah, you need linux-cachyos-rc for the latest sched-ext API changes.

Edit: No, I was wrong. The changes broke all schedulers, but now they're all fixed as of 7d335fa. You'll need to adjust your packaging scripts accordingly, as it now adds optional openrc support. The AUR -git package builds for systemd.

Yeah, you need linux-cachyos-rc for the latest sched-ext API changes.

Edit: No, I was wrong. The changes broke all schedulers, but now they're all fixed as of 7d335fa. You'll need to adjust your packaging scripts accordingly, as it now adds optional openrc support. The AUR -git package builds for systemd.

Thanks for the notes. I have difficulties packing the binary, would you like to help me out, please? @PedroHLC

Logs when building derivation:

⯁ nyx git:(main) ✗ ❯❯❯ nix build .#scx
warning: Git tree '/home/kev/Workspace/personal/nyx' is dirty
do you want to allow configuration setting 'extra-substituters' to be set to 'https://nyx.chaotic.cx/' (y/N)?
do you want to permanently mark this value as untrusted (y/N)?
warning: ignoring untrusted flake configuration setting 'extra-substituters'.
Pass '--accept-flake-config' to trust it
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'nyx.chaotic.cx-1:HfnXSw4pj95iI/n17rIDy40agHj12WfF+Gqk6SonIT8= chaotic-nyx.cachix.org-1:HfnXSw4pj95iI/n17rIDy40agHj12WfF+Gqk6SonIT8=' (y/N)?
do you want to permanently mark this value as untrusted (y/N)?
warning: ignoring untrusted flake configuration setting 'extra-trusted-public-keys'.
Pass '--accept-flake-config' to trust it
error: builder for '/nix/store/ywv9a276zzic6fwxwkh0w3wqy34xpw1f-scx-rlfifo-unstable-20240326-5bfd90bd6.drv' failed with exit code 101;
       last 10 log lines:
       > Running phase: updateAutotoolsGnuConfigScriptsPhase
       > Running phase: configurePhase
       > Running phase: buildPhase
       > Executing cargoBuildHook
       > ++ env CC_X86_64_UNKNOWN_LINUX_GNU=/nix/store/kvlhk0gpm2iz1asbw1xjac2ch0r8kyw9-gcc-wrapper-13.2.0/bin/cc CXX_X86_64_UNKNOWN_LINUX_GNU=/nix/store/kvlhk0gpm2iz1asbw1xjac2ch0r8kyw9-gcc-wrapper-13.2.0/bin/c++ CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=/nix/store/kvlhk0gpm2iz1asbw1xjac2ch0r8kyw9-gcc-wrapper-13.2.0/bin/cc CC_X86_64_UNKNOWN_LINUX_GNU=/nix/store/kvlhk0gpm2iz1asbw1xjac2ch0r8kyw9-gcc-wrapper-13.2.0/bin/cc CXX_X86_64_UNKNOWN_LINUX_GNU=/nix/store/kvlhk0gpm2iz1asbw1xjac2ch0r8kyw9-gcc-wrapper-13.2.0/bin/c++ CARGO_TARGET_X86_64_UNKNOWN_LINUX_GNU_LINKER=/nix/store/kvlhk0gpm2iz1asbw1xjac2ch0r8kyw9-gcc-wrapper-13.2.0/bin/cc CARGO_BUILD_TARGET=x86_64-unknown-linux-gnu HOST_CC=/nix/store/kvlhk0gpm2iz1asbw1xjac2ch0r8kyw9-gcc-wrapper-13.2.0/bin/cc HOST_CXX=/nix/store/kvlhk0gpm2iz1asbw1xjac2ch0r8kyw9-gcc-wrapper-13.2.0/bin/c++ cargo build -j 20 --target x86_64-unknown-linux-gnu --frozen --profile release
       > error: failed to select a version for the requirement `libbpf-rs = "^0.23"`
       > candidate versions found which didn't match: 0.22.1
       > location searched: directory source `/build/cargo-vendor-dir` (which is replacing registry `crates-io`)
       > required by package `scx_rlfifo v0.0.2 (/build/source/scheds/rust/scx_rlfifo)`
       > perhaps a crate was updated and forgotten to be re-vendored?
       For full logs, run 'nix log /nix/store/ywv9a276zzic6fwxwkh0w3wqy34xpw1f-scx-rlfifo-unstable-20240326-5bfd90bd6.drv'.
error: 1 dependencies of derivation '/nix/store/0h0wqhl6dafbjj83mcv979zbabq352qg-scx-unstable-20240326-5bfd90bd6.drv' failed to build

Diff

⯁ nyx git:(main) ✗ ❯❯❯ git diff HEAD .
diff --git a/pkgs/scx/common.nix b/pkgs/scx/common.nix
index a5f5d9a..366681c 100644
--- a/pkgs/scx/common.nix
+++ b/pkgs/scx/common.nix
@@ -6,8 +6,8 @@ rec {
   src = fetchFromGitHub {
     owner = "sched-ext";
     repo = "scx";
-    rev = "5bfd90bd64bc72df64456ed187b06bb21d3b873b";
-    hash = "sha256-/9BDXe9oaa7xPR3ZnqR6euioo2j55PIjw7K8O2w5M6c=";
+    rev = "7d335fa1970e9a4a3a8da2dadde9762556eafe87";
+    hash = "sha256-dZR0lgODeK3A5kDi0T2jXeFZFyZMedGzvxlhiNryX2A=";
     fetchSubmodules = true;
   };

diff --git a/pkgs/scx/lavd/Cargo.lock b/pkgs/scx/lavd/Cargo.lock
index 63142c0..cc867ab 100644
--- a/pkgs/scx/lavd/Cargo.lock
+++ b/pkgs/scx/lavd/Cargo.lock
@@ -90,7 +90,7 @@ dependencies = [
  "regex",
  "rustc-hash",
  "shlex",
- "syn 2.0.55",
+ "syn 2.0.57",
  "which",
 ]

@@ -229,7 +229,7 @@ dependencies = [
  "heck 0.5.0",
  "proc-macro2",
  "quote",
- "syn 2.0.55",
+ "syn 2.0.57",
 ]

 [[package]]
@@ -516,9 +516,9 @@ checksum = "90ed8c1e510134f979dbc4f070f87d4313098b704861a105fe34231c70a3901c"

 [[package]]
 name = "memchr"
-version = "2.7.1"
+version = "2.7.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "523dc4f511e55ab87b694dc30d0f820d60906ef06413f93d4d7a1385599cc149"
+checksum = "6c8640c5d730cb13ebd907d8d04b52f55ac9a2eec55b440c8892f40d56c76c1d"

 [[package]]
 name = "memmap2"
@@ -711,7 +711,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "8d3928fb5db768cb86f891ff014f0144589297e3c6a1aba6ed7cecfdace270c7"

I have difficulties packing the binary, would you like to help me out, please? @PedroHLC

Sorry, current revision (7d335fa, and neither the previous one) does not build for me:

> FAILED: scheds/c/scx_nest.p/scx_nest.c.o
> clang -Ischeds/c/scx_nest.p -Ischeds/c -I../scheds/c -I../scheds/include -I/nix/store/6rl4sxs29wir0x0bcpa29mpj277k3ycq-libbpf-1.3.0/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -pthread -MD -MQ scheds/c/scx_nest.p/scx_nest.c.o -MF scheds/c/scx_nest.p/scx_nest.c.o.d -o scheds/c/scx_nest.p/scx_nest.c.o -c ../scheds/c/scx_nest.c
> ../scheds/c/scx_nest.c:190:2: error: no member named 'struct_ops' in 'struct scx_nest'
>   190 |         SCX_OPS_LOAD(skel, nest_ops, scx_nest, uei);
>       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../scheds/include/scx/compat.h:120:2: note: expanded from macro 'SCX_OPS_LOAD'
>   120 |         UEI_SET_SIZE(__skel, __ops_name, __uei_name);                           \
>       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../scheds/include/scx/user_exit_info.h:57:24: note: expanded from macro 'UEI_SET_SIZE'
>    57 |         u32 __len = (__skel)->struct_ops.__ops_name->exit_dump_len ?: UEI_DUMP_DFL_LEN; \
>       |                     ~~~~~~~~  ^
> ../scheds/c/scx_nest.c:190:2: error: no member named 'struct_ops' in 'struct scx_nest'
>   190 |         SCX_OPS_LOAD(skel, nest_ops, scx_nest, uei);
>       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../scheds/include/scx/compat.h:122:16: note: expanded from macro 'SCX_OPS_LOAD'
>   122 |             (__skel)->struct_ops.__ops_name->exit_dump_len) {                   \
>       |             ~~~~~~~~  ^
> ../scheds/c/scx_nest.c:190:2: error: no member named 'struct_ops' in 'struct scx_nest'
>   190 |         SCX_OPS_LOAD(skel, nest_ops, scx_nest, uei);
>       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ../scheds/include/scx/compat.h:124:13: note: expanded from macro 'SCX_OPS_LOAD'
>   124 |                 (__skel)->struct_ops.__ops_name->exit_dump_len = 0;     \
>       |                 ~~~~~~~~  ^
> 3 errors generated.

Hitting this compilation error in the recent tagged 0.1.8 too. All the rust ones compiled successfully, and configure phase shows all green "YES".

That looks like it's caused by bpftool being too old. I suppose you're using the system packaged bpftool for packaging?

What does bpftool --version say?

bpftool --version

bpftool v7.3.0
using libbpf v1.3
features: libbfd

bpftool being too old

If it wouldn't be too much to ask, could it detect the minimum version in the configure phase?

Yeah, we depend on v7.4.0 now, and yeah, we definitely should add bpftool version check.

Same error with this one:

scx> Running phase: buildPhase
scx> bpftool v7.4.0
scx> using libbpf v1.4
scx> features: libbfd
scx> build flags: -j22

Can you attach scheds/c/scx_nest.p/scx_nest.bpf.skel.h in the build directory? Just in case, this was a clean build with the new bpftool, right?

Just in case, this was a clean build with the new bpftool, right?

This is Nix, so (1) it clones the repo into a new tmpdir to build, and (2) I don't have a /usr/include that could interfere (all the headers, libraries and binaries involved are added by reading their pkg-config/cmake/wrap in a very constrained version+hash absolute path method). The bpftool --version above ran in this clean environment, between configure and build phases.

Can you attach scheds/c/scx_nest.p/scx_nest.bpf.skel.h in the build directory?

If I look at the leftovers, I can only see these:

╰─λ find . -iname '*.skel.h*'
./build/scheds/c/scx_simple.p/scx_simple.bpf.skel.h
./build/scheds/c/scx_qmap.p/scx_qmap.bpf.skel.h
./build/scheds/c/scx_central.p/scx_central.bpf.skel.h
./build/scheds/c/scx_pair.p/scx_pair.bpf.skel.h
./build/scheds/c/scx_flatcg.p/scx_flatcg.bpf.skel.h
./build/scheds/c/scx_userland.p/scx_userland.bpf.skel.h

Note that all these are failing:

FAILED: scheds/c/scx_qmap.p/scx_qmap.c.o 
clang -Ischeds/c/scx_qmap.p -Ischeds/c -I../scheds/c -I../scheds/include -I/nix/store/mx3pmy1xia2ciy0pl6dcm2pj64af3gpb-libbpf-1.4.0/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -pthread -MD -MQ scheds/c/scx_qmap.p/scx_qmap.c.o -MF scheds/c/scx_qmap.p/scx_qmap.c.o.d -o scheds/c/scx_qmap.p/scx_qmap.c.o -c ../scheds/c/scx_qmap.c
FAILED: scheds/c/scx_central.p/scx_central.c.o 
clang -Ischeds/c/scx_central.p -Ischeds/c -I../scheds/c -I../scheds/include -I/nix/store/mx3pmy1xia2ciy0pl6dcm2pj64af3gpb-libbpf-1.4.0/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -pthread -MD -MQ scheds/c/scx_central.p/scx_central.c.o -MF scheds/c/scx_central.p/scx_central.c.o.d -o scheds/c/scx_central.p/scx_central.c.o -c ../scheds/c/scx_central.c
FAILED: scheds/c/scx_pair.p/scx_pair.c.o 
clang -Ischeds/c/scx_pair.p -Ischeds/c -I../scheds/c -I../scheds/include -I/nix/store/mx3pmy1xia2ciy0pl6dcm2pj64af3gpb-libbpf-1.4.0/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -pthread -MD -MQ scheds/c/scx_pair.p/scx_pair.c.o -MF scheds/c/scx_pair.p/scx_pair.c.o.d -o scheds/c/scx_pair.p/scx_pair.c.o -c ../scheds/c/scx_pair.c
FAILED: scheds/c/scx_userland.p/scx_userland.c.o 
clang -Ischeds/c/scx_userland.p -Ischeds/c -I../scheds/c -I../scheds/include -I/nix/store/mx3pmy1xia2ciy0pl6dcm2pj64af3gpb-libbpf-1.4.0/include -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -pthread -MD -MQ scheds/c/scx_userland.p/scx_userland.c.o -MF scheds/c/scx_userland.p/scx_userland.c.o.d -o scheds/c/scx_userland.p/scx_userland.c.o -c ../scheds/c/scx_userland.c

All with the same error (mentioned above).

Can you attach scx_qmap.bpf.skel.h? Thanks.

scx_qmap.bpf.skel.h.txt

(GitHub supports uploading as .txt but not as .h)

I can still build it with bpftool 7.3 with the AUR package inside of a chroot.
It is likely something different, which is missing.

So, your .skel is missing the .struct_ops field. This is a new-ish addition and the skel files are generated by bpftool, so I'm a bit perplexed that it's failing with the latest bpftool. For reference, the following is the same skel file with .struct_ops field. I'll ask BPF folks for help.

scx_qmap.bpf.skel.h.txt

@PedroHLC Can you please retry with the current git HEAD? BPF folks say that bpftool version is the only thing which determines whether .struct_ops would be in the generated skel file. I added bpftool version check in the build script, so hopefully we should be able to learn what's going on.

Can you please retry with the current git HEAD?

The bpftools version is not pointed in the logs, but it didn't trigger your error message:

scx> Found pkg-config: YES (/nix/store/fyxva0kkcmaigwk4218l0zdy8z3s9sj3-pkg-config-wrapper-0.29.2/bin/pkg-config) 0.29.2
scx> Run-time dependency libbpf found: YES 1.4.0
scx> Program bpftool found: YES (/nix/store/yq1w88g8lisakfvdyinydrf1kd649bna-bpftools-6.8.3/bin/bpftool)
scx> Message: cpu=x86_64 bpf_base_cflags=['-g', '-O2', '-Wall', '-Wno-compare-distinct-pointer-types', '-D__TARGET_ARCH_x86', '-mcpu=v3', '-mlittle-endian', '-idirafter /nix/store/q165v8w3hn4cixf5cc58xwpp9al6k2kh-jq-1.7.1-dev/include', '-idirafter /nix/store/680fjvdzv4i81a40yzwga0z64jw1yczb-compiler-rt-libc-17.0.6-dev/include', '-idirafter /nix/store/s70sxxwsbn30cl4ayn7s0kwssg3mcfrk-elfutils-0.191-dev/include', '-idirafter /nix/store/znghzib6sz0pnpq5b6k2rl3r7wdcjvjw-zlib-1.3.1-dev/include', '-idirafter /nix/store/mx3pmy1xia2ciy0pl6dcm2pj64af3gpb-libbpf-1.4.0/include', '-idirafter /nix/store/xxbc7fnvar70sbq8ckks4fl8jagnw52y-clang-wrapper-17.0.6/resource-root/include', '-idirafter /nix/store/gzxqm8dyfirbysqjhh78ivam62ll0m87-glibc-2.39-5-dev/include']

Used commit:

╰─λ nix eval .#scx.src.rev
"1b897ae24b535b0e08c3d444652ef32b36f7b7e6"

The mentioned binary:

╰─λ /nix/store/yq1w88g8lisakfvdyinydrf1kd649bna-bpftools-6.8.3/bin/bpftool --version
bpftool v7.4.0
using libbpf v1.4
features: libbfd

I'm out of ideas. The build script builds BPF object files with clang and invokes bpftool to generate the .bpf.skel.h files. If bpftool is recent enough, which seems like it is, it should generate .struct_ops in the skel but it isn't.

What'd be the simplest way to reproduce the problem you're experiencing?

My bpftool on Arch lists the following --version output:

bpftool v7.3.0
using libbpf v1.3
features: llvm, skeletons

The features missing from Nix's bpftool may be a hint.

I won't have much time for debugging it today, but even when I get all green while building bpftools:

bpftools> build flags: SHELL=/nix/store/a1s263pmsci9zykm5xcdf7x9rv26w6d5-bash-5.2p26/bin/bash bpftool bpf_asm bpf_dbg
bpftools> Auto-detecting system features:
bpftools> ...                                  libbfd: [ on  ]
bpftools>   DESCEND bpftool
bpftools> Auto-detecting system features:
bpftools> ...                         clang-bpf-co-re: [ on  ]
bpftools> ...                                    llvm: [ on  ]
bpftools> ...                                  libcap: [ on  ]
bpftools> ...                                  libbfd: [ on  ]

I can't get it with the feature skeletons:

scx> bpftool v7.4.0
scx> using libbpf v1.4
scx> features: llvm
scx> build flags: -j22

What'd be the simplest way to reproduce the problem you're experiencing?

Having nix installed, run this guy (shebang works):

#! /usr/bin/env nix-build
{ pkgs ? import <nixpkgs> {} }: with pkgs;

(bpftools.override {
  linuxHeaders =
    # Bumps to bpftools v7.4.0
    makeLinuxHeaders {
      inherit (linux_6_8) src version patches;
    };
  stdenv =
    # Enables "clang-bpf-co-re" feature
    llvmPackages.stdenv;
}).overrideAttrs (prevAttrs: {
  buildInputs = prevAttrs.buildInputs ++ [
    # Enables "llvm" feature
    llvmPackages.llvm
    # Enables "libcap" feature
    libcap
  ];
  buildFlags = prevAttrs.buildFlags ++ [
    # try to build 'pid_iter.skel.h'
    "BUILD_BPF_SKELS=1"
  ];
})

It will fail because "pid_iter.skel.h" won't be generated...

Note that this is an override, so it actually increments what is done here: https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/os-specific/linux/bpftools/default.nix (where there are other dependencies, to be more specific: python3 bison flex libopcodes libbfd elfutils zlib readline).

Removing "BUILD_BPF_SKELS=1" will generate this result:

╰─λ ./result/bin/bpftool --version
bpftool v7.4.0
using libbpf v1.4
features: llvm
╰─λ ./result/bin/bpftool gen skeleton
Error: 'skeleton' needs at least 1 arguments, 0 found
Usage: bpftool gen object OUTPUT_FILE INPUT_FILE [INPUT_FILE...]

I can confirm that the issue has been fully resolved. Thanks all!

@htejun I guess whatever revision of bpftools that comes with the linux 6.8 sources is borked. I tested with 20ce6933869b70bacfdd0dd1a8399199290bf8ff from libbpf's GitHub, and it works...