euspectre/kedr

Leak check: "flush" leak statistics at any time (patch included)

Closed this issue · 11 comments

Original issue 9 created by euspectre on 2013-04-08T18:24:56.000Z:

Hello!

I think it would be nice to have an option to flush
statistics at any time.

There are at least two reasons:

  • Certain drivers cannot be unloaded without breaking something. There may be a way to do that, though, but it's certainly tricky, if any. Who would want to unload a video card driver during normal usage, anyway?
  • Sometimes it's useful to isolate leaks that occur during certain time span.

The patch is pretty hackish, and, I guess, it does require additional work. But it works, and, if I understand the code correctly, shouldn't cause any leaks or crashes.

Comment #1 originally posted by euspectre on 2013-04-09T08:01:54.000Z:

Thanks for the patch. Interesting.

I think this may be useful. If unloading of the driver is so tricky as it is for video drivers, it could be reasonable to obtain the stats earlier, despite the possibly high number of false positives.

One thing I noticed at the first glance is that the flushing occurs when the user opens "flush" file in debugfs. It would be more appropriate to do that on write to that file. This would be more similar to kmemleak and other systems.

I think, I'll look into the patch in more detail in a couple of days and will rewrite is as needed.

Comment #2 originally posted by euspectre on 2013-04-09T08:02:25.000Z:

<empty>

Comment #3 originally posted by euspectre on 2013-04-09T18:08:45.000Z:

Additionally, it might be convenient to write some parameters regarding statistics output format to that control file.
I have some ideas of making the output more useful. Some of them would prevent KEDR from "collapsing" identical stack traces, so it should better be optional.
I'll send them in separate patches.

Comment #4 originally posted by euspectre on 2013-04-10T06:55:53.000Z:

Such things should be optional indeed. Without "collapsing" of the similar stack traces, it is currently very easy to run out of memory and to get some other problems.

Looking forward to seeing your ideas.

Comment #5 originally posted by euspectre on 2013-04-15T11:39:04.000Z:

I have implemented flushing of results and pushed the changes to the repository.

Please check if that works for you.

The idea is the same as in the original proposed patch. The implementation is different in several places because it was needed to handle the following.

  1. Before these changes, LeakCheck deleted the information about leaks right after it had passed it to the output system. I needed to separate these two activities because each flush would remove the accumulated info and the subsequent flush operations would yield inconsistent results.
  2. A user may try to flush the stats after the target has been unloaded and loaded again. So, it is better to avoid saving the address of 'struct module' for the target in the inode for the "flush" file and use it after that. The address may be different when the target is reloaded. The current implementation handles that case too.

Thanks for proposing this enhancement!

Comment #6 originally posted by euspectre on 2013-04-17T14:15:15.000Z:

<empty>

Comment #7 originally posted by euspectre on 2013-04-17T20:34:05.000Z:

Just tested: works like a charm. Thanks.

Comment #8 originally posted by euspectre on 2013-04-17T21:21:44.000Z:

It would be great if I actually could delete the accumulated info. Because most of the time I use flush I'm not interested in old allocations at all.

To clarify:

  1. Driver allocates a lot of structures at start. They will never go away, unless the driver is unloaded. As I mentioned earlier, it's inconvenient to unload it.
  2. At this point I want to delete the data.
  3. Then I run some test that's supposed to cause allocation and deallocation.
  4. Finally I run 'flush' to verify that the test didn't cause any leaks. If the step 2 was skipped, I would get a lot of noise about much older allocations. I don't need it.

I think I have an idea how to solve it in an elegant way. We can embed timestamps into resource info objects, and make flush file accept timestamp as parameter (that is, print leaks corresponding to allocations younger than specified timestamp).

Comment #9 originally posted by euspectre on 2013-04-18T07:49:34.000Z:

Ah, I see.

A "cheaper" option would be another control file in debugfs named, say, "clear". If one writes something to it, LeakCheck will forget the data about allocations and deallocations accumulated so far.

Among other things, this would make LeakCheck look more like Kmemleak tool included into the kernel. Kmemleak uses a single file in debugfs for both flushing and clearing though (depends on what the user writes there).

Timestamps are nice but this functionality can be tricky to implement right.

Comment #10 originally posted by euspectre on 2013-04-27T11:33:29.000Z:

I have added "clear" file. The implementation is in the repository.

Comment #11 originally posted by euspectre on 2013-05-14T06:55:25.000Z:

Marking as fixed. Feel free to reopen if there are any problems with the implementation.