namhyung/uftrace

ERROR : writing shmem name to pipe: Broken pipe

hyunjunpark-1 opened this issue · 9 comments

Hi namhyung & honggyu
I'm using for performance optimizing for some app

This app start from systemd as below
-/usr/bin/uftrace -d /tmp/uftrace.data --no-libcall record "my app path"

When I stop the record by systemctl, error occur as below

mcount: /usr/src/debug/uftrace/1.0-r0/git/libmcount/misc.c:103:uftrace_send_message
 ERROR: writing shmem name to pipe: Broken pipe
mcount: /usr/src/debug/uftrace/1.0-r0/git/libmcount/misc.c:103:uftrace_send_message
 ERROR: writing shmem name to pipe: Broken pipe
mcount: /usr/src/debug/uftrace/1.0-r0/git/libmcount/misc.c:103:uftrace_send_message
 ERROR: writing shmem name to pipe: Broken pipe

In the uftrace dump file, there is very simple information such as 'main,termhandler, Poco::net'
But , I know that, this app using many librarys not only Poco libs
Can you help me about the uftrace record?

# cpu info            : ARM64 (v8)
# kernel version      : Linux 4.4.162

Hi @hyunjunpark-1,

Thanks for sharing the crash info. It'd be more helpful if you can share with more verbose log messages. Could you run uftrace as follows?

$ uftrace record -d /tmp/uftrace.data -vvv --logfile=/tmp/uftrace.log --no-libcall app

The uftrace.log will contain more verbose execution log and we need to examine how uftrace is finished with the ERROR log you shared.

It'd be much better to share the uftrace.log file, but if not possible. Please remove some internally sensitive info and share some of its snippet. Thanks.

It seems #802 has a similar problem.

I remember that the error log in #802 was took from AArch64 machine, not from x86_64. I tested it once again in x86_64 machine, but it works fine.

I upload uftrace log

uftrace.log

Hmm.. I don't see anything strange from the log file. Is it possible to run the program from the shell directly instead of running it from systemd script?

There were some issues when the program was executed from systemd and it worked fine after running it standalone.

@hyunjunpark-1 Could you please test below patch?

diff --git a/libmcount/mcount.c b/libmcount/mcount.c
index d8e4617..2a83299 100644
--- a/libmcount/mcount.c
+++ b/libmcount/mcount.c
@@ -579,9 +579,6 @@ void mtd_dtor(void *arg)
        if (mtdp->dead)
                return;
 
-       if (mcount_should_stop())
-               mcount_trace_finish(true);
-
        /* this thread is done, do not enter anymore */
        mtdp->recursion_marker = true;
        mtdp->dead = true;
@@ -604,6 +601,9 @@ void mtd_dtor(void *arg)
        tmsg.time = mcount_gettime();
 
        uftrace_send_message(UFTRACE_MSG_TASK_END, &tmsg, sizeof(tmsg));
+
+       if (mcount_should_stop())
+               mcount_trace_finish(true);
 }
 
 void __mcount_guard_recursion(struct mcount_thread_data *mtdp)
diff --git a/libmcount/misc.c b/libmcount/misc.c
index b6538b3..8a04086 100644
--- a/libmcount/misc.c
+++ b/libmcount/misc.c
@@ -100,7 +100,7 @@ void uftrace_send_message(int type, void *data, size_t len)
        len += sizeof(msg);
        if (writev(pfd, iov, 2) != (ssize_t)len) {
                if (!mcount_should_stop())
-                       pr_err("writing shmem name to pipe");
+                       pr_err("writing shmem name to pipe: %d", type);
        }
 }
 

I applied @namhyung's patch , I saw same issue on Erlang Runtime BEAM.

I saw the type 5 on ERROR: writing shmem name to pipe: 5: Bad file descriptor.

~/otp$ uftrace record bin/escript factorial.erl 3
mcount: /home/kim/uftrace/libmcount/misc.c:85:uftrace_send_message
 ERROR: writing shmem name to pipe: 5: Bad file descriptor
erl_child_setup closed

Crash dump is being written to: erl_crash.dump...done

I checked msg type on uftrace.h.

/* msg format for communicating by pipe */
struct uftrace_msg {
	unsigned short magic; /* UFTRACE_MSG_MAGIC */
	unsigned short type; /* UFTRACE_MSG_* */
	unsigned int len;
	unsigned char data[];
};

enum uftrace_msg_type {
	UFTRACE_MSG_REC_START = 1,
	UFTRACE_MSG_REC_END,
	UFTRACE_MSG_TASK_START,
	UFTRACE_MSG_TASK_END,
	UFTRACE_MSG_FORK_START,
	UFTRACE_MSG_FORK_END,
	UFTRACE_MSG_SESSION,
	UFTRACE_MSG_LOST,
	UFTRACE_MSG_DLOPEN,
	UFTRACE_MSG_FINISH,
...

I need to check #1883 .

#1883 merged, I don't think this issue is going to happen anymore, but if it does, someone can open a new issue.