latchset/crypto-auditing

Collect instrumented library name

Closed this issue · 3 comments

ueno commented

It would be nice to collect library names where probes are instrumented. This could be done in the following steps:

  • The BPF program reports PID and the topmost symbol address obtained with bpf_get_stack(..., BPF_F_USER_STACK)
  • The agent reads the /proc/$PID/map_files directory and finds the library which is loaded and mapped to a memory area containing the symbol address

As this may affect performance, a caching mechanism would be desired.

One process can map two libraries + caching means additional complexity.

I'd rather see the library name specified with the probes.

Even if we can get a file name, the same path might point to different libraries in different mount namespaces. I think passing the library name is just a lot simpler and easier.

ueno commented

I did further experiment and it seems bpf_get_stack with BPF_F_USER_STACK | BPF_F_USER_BUILD_ID actually returns the build-id embedded in the shard library. In my case:

$ readelf -nW /usr/lib64/libgnutls.so.30
Displaying notes found in: .note.gnu.build-id
  Owner                Data size        Description
  GNU                  0x00000014       NT_GNU_BUILD_ID (unique build ID bitstring)         Build ID: d492f928d58a9176e3a6bb93a4e2d29286b8e155
...
$ cborseq2diag.rb audit.cborseq
{"context": h'1C22114812314EB015E451B8DF59814A', "origin": h'01000000D492F928D58A9176E3A6BB93A4E2D29286B8E15592EB09000000000001000000D492F928D58A9176E3A6BB93A4E2D29286B8E155CEEA040000000000', ...}

where origin is the output of bpf_get_stack.