cilium/ebpf

`LoadPinned*` silently succeeds on objects of wrong type

mtardy opened this issue · 5 comments

I was writing something to parse objects from a /sys/fs/bpf fs and was expecting to receive an error when using ebpf.LoadPinnedMap on a prog and ebpf.LoadPinnedProg on a map. Instead, the function properly returns a struct but with garbage data in each "wrong" case.

So I've looked into it, it seems that the kernel doesn't expose anything to know in advance, given a BPF pinned object file, if it's a map or a prog. When you stat such a file there's no way to know if it's going to be a BPF map or prog.

However, when you open such a file and you end up with a file descriptor, if you readlink this fd, you retrieve something like anon_inode:bpf-map or anon_inode:bpf-prog. See here for map fd for example. This, and reading /proc/self/fdinfo/<fd> allows you to discriminate, since you'll bump into fields named map_<...> for maps and prog_<...> for progs.

All of that to say that we could use the readlink tricks to return an error properly instead of bogus information from LoadPinnedMap or LoadPinnedProg, @dylandreimerink suggested adding this to the newProgramInfoFromFd for example, we could also do newMapInfoFromFd respectively:

ebpf/info.go

Line 149 in 88736f4

func newProgramInfoFromFd(fd *sys.FD) (*ProgramInfo, error) {

ebpf/info.go

Line 54 in 88736f4

func newMapInfoFromFd(fd *sys.FD) (*MapInfo, error) {

On another topic, I think it would be great to have an API to parse objects indifferently if they are a map or a prog. A bit of what we can do right now with a "hack": reading everything, map and prog, as prog for example, since it doesn't return an error and then read common fields like memlock in fdinfo. It could be something like LoadPinned that would be common between maps and progs, and thus we could retrieve memlock using this object, or cast it into a map or prog if we want to go further.

I could elaborate on my use case and see if it makes sense but would love to know what people think :)!

Yep, we should build some guard rails into the Load* functions that sanity-check fdinfo. IIRC fdinfo hasn't always been available for bpffs nodes, but my memory's a little fuzzy since it's so long ago.

I ran the same train of thought a while ago when considering bpffs garbage collection in Cilium, since on newer kernels it exclusively uses bpf_link for everything now (xdp, tcx, cgroups, netkit) and it would be nice to check for defunct links on startup. Let's consider the general use case of crawling a given bpffs path and dumping all objects.

We should make it trivial to write code that can do this in a loop:

- LoadPinnedMap
- LoadPinnedProgram
- LoadPinnedLink

The caller can write a function that walks bpffs and inserts any successful objects into a slice or map of the respective return type (*Map, *Program, *Link). This can be done in a few lines of code.

Regarding

I think it would be great to have an API to parse objects indifferently if they are a map or a prog.

I wouldn't go for a LoadPinnedObject(path string) any, since that wouldn't make the LoC call site any shorter. The caller would still need a type switch on *Map, *Program and *Link and put the result of the type assertion into a slice/map. Returning any from such a function is bad UX in general. Syscall efficiency may be a bit higher, but we're talking about a few extra open and readlink calls against virtual fs entries, so a worthwhile trade-off IMO.

Happy to guide anyone willing to work on this, let me know.

Okay cool, I can try to write a patch to return an error on wrong-BPF-object using readlink in newSomethingFromFd functions if you think that's appropriate. Cool. Agree for the generic interface, it doesn't make a huge difference to just check against a known error (WrongBPFObject or whatever) returned by the function and try again with another one.

@mtardy FYI some things are moving in #1570. This patch now always consults fdinfo regardless if ObjInfo is supported on the kernel or not.

We could include a call to scanFdInfo(fd, map[string]interface{}{"map_type": &foo}) at the start of newMapFromFD to check if it's a map. readlink would be more elegant, we need to check if it works on all kernels though. Let me know what solution you come up with!