Irqbalance/irqbalance

irqbalance-ui: aborts with coredump if irqbalance is started with IRQBALANCE_BANNED_CPUS

Closed this issue · 4 comments

IRQBALANCE_BANNED_CPULIST=32-47,48-63 ./irqbalance

irqbalance-ui

free(): invalid pointer
Aborted (core dumped)

ll /var/lib/systemd/coredump/

total 128
-rw-r-----. 1 root root 61291 Jul 15 11:31 core.irqbalance-ui.0.d97e90f059414c1b8e0d0eb1d00c8c14.4067.1657864890000000.zst
-rw-r-----. 1 root root 61223 Jul 15 11:36 core.irqbalance-ui.0.d97e90f059414c1b8e0d0eb1d00c8c14.4101.1657865160000000.zst

Jul 15 11:36:00 localhost.localdomain systemd[1]: Started Process Core Dump (PID 4102/UID 0).
Jul 15 11:36:00 localhost.localdomain systemd-coredump[4103]: [🡕] Process 4101 (irqbalance-ui) of user 0 dumped core.

                                                                       Module /root/irqbalance/irqbalance-ui with build-id 88bdf0081453ff408adb5d6b74d7cfacb91e4a7b
                                                                       Module linux-vdso.so.1 with build-id 7c3e210917108833f13aaa2b8470456d7118448e
                                                                       Module ld-linux-x86-64.so.2 with build-id d66f437b27ec0a0a70d480f7731f9c9aafd98bad
                                                                       Module libpcre.so.1 with build-id cffb947bcc416dca3cd249cdb0a1c6f614549c30
                                                                       Module libc.so.6 with build-id 79ee25245bb9d11d30e095e7ee2629aa4fe4dbf6
                                                                       Module libtinfo.so.6 with build-id 7745adf36f8d068cdf99dc45bab9352ade38b6eb
                                                                       Module libncursesw.so.6 with build-id 25554c31777f891c014487b5dd91b2d198aa1941
                                                                       Module libm.so.6 with build-id 07bcee7dd6b3c9dda6a73fd434e2560632e3241e
                                                                       Module libglib-2.0.so.0 with build-id addb8fcb7df102ae4897fec40e395bcfb4f4ca59
                                                                       Stack trace of thread 4101:
                                                                       #0  0x00007f4bbd28642c __pthread_kill_implementation (libc.so.6 + 0xa642c)
                                                                       #1  0x00007f4bbd239d06 raise (libc.so.6 + 0x59d06)
                                                                       #2  0x00007f4bbd20c7d3 abort (libc.so.6 + 0x2c7d3)
                                                                       #3  0x00007f4bbd27a567 __libc_message (libc.so.6 + 0x9a567)
                                                                       #4  0x00007f4bbd29043c malloc_printerr (libc.so.6 + 0xb043c)
                                                                       #5  0x00007f4bbd291d4c _int_free (libc.so.6 + 0xb1d4c)
                                                                       #6  0x00007f4bbd2947d5 free (libc.so.6 + 0xb47d5)
                                                                       #7  0x0000000000402c25 n/a (/root/irqbalance/irqbalance-ui + 0x2c25)
                                                                       ELF object binary architecture: AMD x86-64

The problem also happens if system is booted with isolcpus and nohz_full parameters.

On another system, I got following output.

munmap_chunk(): invalid pointer
Aborted (core dumped)

Booted with.
isolcpus=72-75,90-99,108-115,126-140 nohz_full=72-75,90-99,108-115,126-140

Jul 15 02:38:33 localhost.localdomain systemd[1]: Started Process Core Dump (PID 38659/UID 0).
Jul 15 02:38:34 localhost.localdomain systemd-coredump[38660]: Process 38658 (irqbalance-ui) of user 0 dumped core.

                                                                                  Stack trace of thread 38658:
                                                                                  #0  0x00007f5f4c705a4f raise (libc.so.6)
                                                                                  #1  0x00007f5f4c6d8db5 abort (libc.so.6)
                                                                                  #2  0x00007f5f4c748057 __libc_message (libc.so.6)
                                                                                  #3  0x00007f5f4c74f1bc malloc_printerr (libc.so.6)
                                                                                  #4  0x00007f5f4c74f46c munmap_chunk (libc.so.6)
                                                                                  #5  0x00000000004020f5 n/a (/root/irqbalance/irqbalance-ui)

Thanks,

@liuchao173 this is almost certainly related to one of your recent UI changes, please investigate asap

@vishal14051992 if you could run the UI utility under gdb, and provide a line-accurate backtrace, it would help identify the problem.

@vishal14051992 I can't reproduce it in my environment, can you run the UI utility under gdb and provide a line-accurate backtrace.

Hello,

I compiled latest tag for irqbalance github. Here are detailed steps that I have performed.

# git clone https://github.com/Irqbalance/irqbalance.git
# git describe
v1.6.0-189-g56a9a0f

# ./autogen.sh
# ./configure
# make

Started irqbalance with banned cpu.

# IRQBALANCE_BANNED_CPULIST=65 ./irqbalance
# gdb ./irqbalance-ui
(gdb) run
free(): invalid pointer

                       Program received signal SIGABRT, Aborted.
                                                                __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44	      return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff7bcc493 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007ffff7b7fd06 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7b527d3 in __GI_abort () at abort.c:79
#4  0x00007ffff7bc0567 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7ce659a "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#5  0x00007ffff7bd643c in malloc_printerr (str=str@entry=0x7ffff7ce41e7 "free(): invalid pointer") at malloc.c:5536
#6  0x00007ffff7bd7d4c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4327
#7  0x00007ffff7bda7d5 in __GI___libc_free (mem=mem@entry=0x406010) at malloc.c:3279
#8  0x0000000000402c25 in parse_setup (setup_data=<optimized out>) at ui/irqbalance-ui.c:191
#9  0x0000000000403965 in parse_setup (setup_data=setup_data@entry=0x52aa00 "SLEEP 10 BANNED 00000002,00000000,00000000") at ui/irqbalance-ui.c:207
#10 0x0000000000405958 in display_tree () at ui/ui.c:797
#11 0x0000000000405a6e in init () at ui/ui.c:682
#12 0x00000000004024a7 in main (argc=<optimized out>, argv=<optimized out>) at ui/irqbalance-ui.c:533
(gdb) 

I hope this helps. Let me know if anything else is required.

I see, my environment doesn't have enough CPU. When processing the ',' in hex_to_bitmap, it returns '0000\ 0' directly. The map will be freed in parse_setup, but it is not requested through malloc. I'll fix this bug as soon as possible.