euspectre/kedr

detecting sk_buff leaks

Closed this issue · 6 comments

Original issue 4 created by euspectre on 2012-02-22T14:53:35.000Z:

Hello,

I'd like KEDR to track sk_buff alloc/free. I wrote some basic code to track __alloc_skb, skb_clone and kfree_skb. It's not perfect because it only tracks struct sk_buff allocation/free, not the data buffer it points to, but that doesn't matter for me here.

Now here's my problem:

The sk_buff is allocated by me on the send side and given to the network. It's basically free'd by the network driver after send (outside of the module I am debugging). On the receive side, the incoming sk_buff is allocated by the driver before giving it to me, and I take care of freeing it.

I worked around the send side problem by declaring that dev_queue_xmit() takes care of freeing my sk_buff. The freeing doesn't occur in dev_queue_xmit() for real, but that's where my module last sees its outgoing sk_buff, so that's fine.

On the receive side, it's harder. If the Linux receive model was my module calling a core/driver function to get the next incoming sk_buff, I would just tell KEDR that this function allocates the sk_buff. But the Linux model is the opposite: the core/driver calls a callback in my module with the incoming skb as a parameter.

So I basically need a way to tell KEDR "the skb passed to my module here has been allocated earlier, now please track it". How can I do that?

Brice Goglin

PS: Yeah, I'll share my code if you're interested

Comment #1 originally posted by euspectre on 2012-02-23T17:45:00.000Z:

A tool to track allocation/deallocation of sk_buff structures is a very useful one indeed. I think many users could benefit from it, so if you share the code, it would be great.

As for the problem, KEDR itself currently has no way to recognize that an object was allocated/freed by another module or by the kernel proper. We also faced this problem in our projects.

Two ideas:

  1. Would it be OK for your tool if KEDR could track several kernel modules as a single entity? That is, if it could replace the calls in more than one module with the calls to your functions? This way, you could track allocation/deallocation of sk_buff structures both in the module you actually debug and in the underlying network driver as if they were a single module.

This feature is already in our TODO list but it may take some time to implement it.

  1. A colleague of mine and the co-author if KEDR, Andrey Tsyvarev has developed a system to track callback operations in the kernel modules, KEDR-COI:
    http://code.google.com/p/kedr-callback-operations-interception/
    So far, we have used it to track file operations, inode operations and the like. But it also provides the means to prepare the components to track other types of callbacks.

The idea of that system is to intercept the calls to the functions that have registered the objects with callbacks of interest and to replace the callbacks in these objects.

Using KEDR-COI and KEDR at the same time, you could probably intercept the callback in the module under analysis that receives that newly created sk_buff instance and track that instance thereafter.

KEDR-COI is still experimental but it can be useful.
The documentation for KEDR-COI is available in its package, see sources/doc/html/.

Comment #2 originally posted by euspectre on 2012-02-23T21:08:23.000Z:

KEDR-COI looks interesting indeed. I'll see if I can take some time to test it. In my case, you would have to intercept dev_add_pack() and dev_remove_pack() (these are where ethernet-type-specific receive handler callbacks are added/removed).

I am not fully convinced by idea (1). Three reasons:

  • I thought that KEDR required the target module to be loaded after KEDR. My network drivers are usually loaded at boot time because I need them to ssh on the machine before I use KEDR.
  • I wonder if some corner cases in the receive stack could have the core kernel play with sk_buff between drivers and my module.
  • My network drivers also receive normal traffic such as IP. So I would need to either track the tcp/ip modules with KEDR, or have a way to filter TCP/IP sk_buff allocations out of the KEDR output.

In short, I prefer idea (2) :)

My patch is attached. If you need a copyright header, it's:
Copyright © 2012 inria. All rights reserved.

Comment #3 originally posted by euspectre on 2012-02-24T08:19:27.000Z:

Got your point concerning the first idea. Yes, analyzing several modules as a group seems to do little good in case of your task. And you are right, KEDR needs to be loaded before the target module and it cannot analyze the kernel proper, only the modules.

Thank you for the patch. Now I see that you have actually patched LeakCheck rather than created a separate plugin for KEDR as I thought before. It is all quite different and probably makes it harder to use KEDR-COI than if sk_buff operations were tracked by a separate plugin.

For some time now I have been planning to redesign LeakCheck to some extent. May be it would help here too. The idea is to separate its core (analysis engine) from the components that collect alloc/free events and to allow creating custom components of this kind. To put it simple:

  • LeakCheck core could export API functions like
    kedr_lc_alloc_event(const void *block, size_t size, const void *caller_address, <...>)
    and
    kedr_lc_free_event(const void *block, const void *caller_address, <...>)

Similar to what already exists in leak_check/mbi_ops.* but is not exported yet.

  • Different plugins (probably, separate kernel modules) could intercept different groups of alloc/free routines and call kedr_lc_*_event() when appropriate.
  • The core would analyze the data and report the results the same way as LeakCheck does it now.

In our work, we would benefit from this redesign too. Among other things, it would made much easier to use LeakCheck with KEDR-COI, which we need when we analyze file system modules. In addition, I do not like the current monolythic structure of LeakCheck. The Linux kernel ABI is ever-changing and we need to make sure our system is usable on all the versions and variants of the kernel we aim to support. A modular LeakCheck would make things easier to some extent. I would also like to give the user the means to choose what she would like to check with LeakCheck, e.g. slab allocs, page allocs, sk_buff operations like your tool does, drm allocs, or may be all of this at the same time, etc.

If the above is implemented, a tool to track sk_buff operations could be a separate plugin to KEDR, a separate kernel module. That module would use KEDR core API to intercept the calls to the necessary exported functions (same as it is now). A custom interceptor created with the help of KEDR-COI would help tracking the appropriate callbacks. Your plugin would call kedr_lc_*_event() when appropriate and LeakCheck would analyze the collected data.

Both Andrey and I are currently working on another system that I hope would make it into KEDR framework in the future (http://code.google.com/p/kernel-strider/).
But I have some time now, so I will try to implement that LeakCheck redesign as I have described.

Ideas, comments and suggestions are always welcome :-)

Comment #4 originally posted by euspectre on 2012-03-02T13:00:08.000Z:

I have revisited LeakCheck as I described above. Among other things, it now provides the API that other kernel modules can use to tell LeakCheck something like "this object has been allocated" or "this object is about to be deallocated". It should also be easier now to add support for tracking custom functions.

I have written a brief description of the API and of the examples: http://code.google.com/p/kedr/wiki/LeakCheck_API

If you would like to use these features, you could build KEDR from the most recent revision available here in the repository. It should be stable enough, at least, it passes our tests on a dozen of systems with different variants of Linux and versions of the kernel ;-)

The examples ("Mempool_ops" and "Annotations") I have also prepared could serve as a good starting point for your task.

Comment #5 originally posted by euspectre on 2012-08-30T07:47:06.000Z:

Shall we call this fixed? Are there any issues here that need to be resolved?

Comment #6 originally posted by euspectre on 2012-09-27T14:10:10.000Z:

OK, marking as fixed. If anyone is having issues related to this one, feel free to open another bug here.