namhyung/uftrace

Add sampling-based filtering support

namhyung opened this issue · 2 comments

For a large scale project, it's easy to get a huge data after running it with uftrace. Although uftrace has various filters to reduce the amount of trace data, it sometimes to hard to know what/how to apply filters. Another approach to achieve this is sampling - it records the data at some frequency N. IOW it'll only trace the function on every Nth execution. But it still needs a target function.

Let's talk with an example. I'd like to start with our lovely t-abc. :) But imagine it's changed to call function a 5 times in the main (let's call it t-abc5). And you don't want to trace them all, and just want to do it every other time (N = 2). Then I think we can do like:

$ uftrace -F main -F a@sample=2  t-abc5
# DURATION     TID     FUNCTION
            [1356607] | main() {
            [1356607] |   a() {                   # 2nd execution
            [1356607] |     b() {
            [1356607] |       c() {
   1.536 us [1356607] |         getpid();
   2.400 us [1356607] |       } /* c */
   2.863 us [1356607] |     } /* b */
   3.205 us [1356607] |   } /* a */
            [1356607] |   a() {                   # 4th execution
            [1356607] |     b() {
            [1356607] |       c() {
   0.945 us [1356607] |         getpid();
   1.490 us [1356607] |       } /* c */
   1.752 us [1356607] |     } /* b */
   2.066 us [1356607] |   } /* a */
  11.418 us [1356607] | } /* main */

Note that the sampling was applied to -F filter so it'd show/hide all function under the target. Also I had to add -F main to show the higher level functions too.

It should work with normal triggers (-T) too, then I think it only affects the target function (unless it has filter action, of course), not the children.

$ uftrace -T a@sample=2  t-abc5
# DURATION     TID     FUNCTION
   1.468 us [1358575] | __monstartup();
   0.811 us [1358575] | __cxa_atexit();
            [1358575] | main() {
            [1358575] |   b() {                   # no 'a' on 1st
            [1358575] |     c() {
   0.908 us [1358575] |       getpid();
   1.366 us [1358575] |     } /* c */
   1.638 us [1358575] |   } /* b */
            [1358575] |   a() {
            [1358575] |     b() {
            [1358575] |       c() {
   0.824 us [1358575] |         getpid();
   1.159 us [1358575] |       } /* c */
   1.393 us [1358575] |     } /* b */
   1.629 us [1358575] |   } /* a */
            [1358575] |   b() {                   # no 'a' on 3rd
            [1358575] |     c() {
   1.144 us [1358575] |       getpid();
   1.472 us [1358575] |     } /* c */
   1.767 us [1358575] |   } /* b */
            [1358575] |   a() {
            [1358575] |     b() {
            [1358575] |       c() {
   1.118 us [1358575] |         getpid();
   1.452 us [1358575] |       } /* c */
   1.757 us [1358575] |     } /* b */
   2.004 us [1358575] |   } /* a */
            [1358575] |   b() {                   # no 'a' on 5th
            [1358575] |     c() {
   0.908 us [1358575] |       getpid();
   1.357 us [1358575] |     } /* c */
   1.671 us [1358575] |   } /* b */
  11.683 us [1358575] | } /* main */

Some consideration on the counter.

I don't want to make the counter global (per function) which requires atomic operation every time. As I expect the target function will be called frequently, keeping it per-thread should be fine. That means it might not see the target function even if it's called 2N - 2 times globally, but when two threads called it N - 1 times each. In that case, users should select a smaller N and try again.

More complex cases. The child function can have its own sample filter. Then we need to think about how to count the frequency. I think it should count as same as users see in the final output - IOW do NOT count filtered-out functions.

For example, let's use sample filters for both a and c functions.

$ uftrace -F a@sample=2 -F c@sample=2  t-abc5
# DURATION     TID     FUNCTION
            [1363218] | a() {                    # 2nd execution of 'a'
   1.935 us [1363218] |   b();                   # 1st execution of 'c' - no sample
   2.329 us [1363218] | } /* a */
            [1363218] | a() {                    # 4th execution of 'a'
            [1363218] |   b() {
            [1363218] |     c() {                # 2nd execution of 'c'
   0.444 us [1363218] |       getpid();
   0.772 us [1363218] |     } /* c */
   0.959 us [1363218] |   } /* b */
   1.168 us [1363218] | } /* a */