Segmentation fault with uftrace dump --flame-graph
innerout opened this issue · 8 comments
Hi,
I am trying to run uftrace dump --flame-graph
with a small test that I have. Although record, report and graph commands work fine, the flamegraph option throws a segmentation fault.
I am attaching the uftrace.data that are generated from the execution.
(Edit)
I cannot attach uftrace.data since they are 28 MB and github has a limit of 25 MB for attachments.
From a small investigation with gdb I found that uftrace crashes here:
0x0000555555591e75 in graph_get_task (task=0x0, tg_size=tg_size@entry=56)
at /usr/src/debug/uftrace/uftrace-0.16/utils/graph.c:41
41 if (tg->task->tid == task->tid)
Also, here is the uftrace info
output:
# system information
# ==================
# program version : v0.16 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
# recorded on : Mon Jun 3 14:30:44 2024
# cmdline : uftrace record -k tests/test_par_threadpool
# cpu info : Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
# number of cpus : 8 / 8 (online / possible)
# memory info : 15.7 / 31.2 GB (free / total)
# system load : 2.64 / 2.94 / 2.95 (1 / 5 / 15 min)
# kernel version : Linux 6.9.3-zen1-1-zen
# hostname : kernel
# distro : "EndeavourOS"
#
# process information
# ===================
# number of tasks : 5
# task list : 205624(test_par_thread), 205645(bg_worker), 205646(bg_worker), 205647(bg_worker), 205648(bg_worker)
# exe image : /home/gxanth/gitfolders/parallax/build/tests/test_par_threadpool
# build id : dec3c32d8d5412209b5447be9d2c69ffb4351508
# pattern : regex
# exit status : exited with code: 0
# elapsed time : 1.018009329 sec
# cpu time : 2.222 / 0.983 sec (sys / user)
# context switch : 14094 / 657 (voluntary / involuntary)
# max rss : 6920 KB
# page fault : 0 / 1005 (major / minor)
# disk iops : 0 / 8 (read / write)
Thanks for the report. It's strange that task
is NULL when you call graph_get_task()
. The flame graph logic is handled in dump_flame_task_rstack()
and it calls the graph_get_task()
. But it also accesses task->rstack
so it should see the problem there before. Also I think the code should not allow task being NULL in the first place.
Can you please upload the the full backtrace from GDB?
Also you can try --tid
option to limit the tasks during uftrace dump
. Note that it accepts CSV to specify multiple tasks.
In the URL https://files.fm/u/cs3pkm6h27 is the uftrace.data of the run. I don't know if they can be used on different computers.
Here is the full backtrace. gdb.txt
Thanks, I can reproduce the problem. FYI uftrace is designed to be able to analyze data from different machine and even from a different architecture.
I'll take a look at the bug.
Keep up the good work, and thank you very much for all the information!
Hmm.. sometimes kernel trace data contains related tasks and it makes get_task_handle() returning NULL. This would fix the problem but I need to take a look if there are more to fix.
diff --git a/cmds/dump.c b/cmds/dump.c
index 75a81373..2ad53eb1 100644
--- a/cmds/dump.c
+++ b/cmds/dump.c
@@ -1152,6 +1152,8 @@ static void dump_flame_kernel_rstack(struct uftrace_dump_ops *ops,
tid = kernel->tids[cpu];
task = get_task_handle(kernel->handle, tid);
+ if (task == NULL)
+ return;
graph = graph_get_task(task, sizeof(*graph));
I also see the report output is invalid. I think there's a mismatch of ENTRY and EXIT. Probably with sched in/out..
$ uftrace report | head
Total time Self time Calls Function
========== ========== ========== ====================
25.000 h 25.000 h 890300 x64_sys_call
25.000 h 25.000 h 14686 switch_fpu_return
2.077 s 574.590 ms 1022312 pthread_mutex_lock
1.360 s 580.720 ms 1022312 pthread_mutex_unlock
1.002 s 5.955 us 1 main
1.000 s 20.558 us 1 sleep
612.461 ms 616.700 ms 13912 linux:schedule
77.251 ms 77.411 ms 570 linux:schedule (pre-empted)
Thanks! I will test it and report back if there are any issues!