Stack overflow on battery removal
rkost opened this issue · 9 comments
Description
First world problem: When removing the (only) battery that is installed on my system, battop crashes:
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
[1] 23948 abort (core dumped) battop
The terminal becomes unusable after battop crashed (random inserted characters when using the mouse, sigterm will not work).
System info:
- Arch linux, kernel 5.1.3
- Thinkpad P71
- battop 0.2.3
Additional notes
- battop works fine when not removing the battery. Temperature monitoring does not work (not available), but I guess this is something system-specific.
- The reason I am swapping batteries is to be able to charge my second one
Let me know if there is any information that is missing.
Oh, snap :( Thank you for this huge bug report.
Unfortunately, the battery in my notebook is soldered to the motherboard, so I'm not sure how can I reproduce this issue without the hammer and a screwdriver help :|
Can you attach the core dump created? I believe in can be done with a coredumpctl dump battop -o battop.dump
shell command. While it was produced from the release version (without debug symbols and etc), it still might be very helpful.
Haha, yeah. Most modern notebooks have soldered batteries these days. I probably wouldn't think of some maniac removing the battery during runtime either :D
I have compiled battop for the develop configuration and reproduced the error. Since github does not like the dump's file size I have uploaded it to my server. I hope this helps. Let me know if you need more information or want to test something.
Output of coredumpctl:
rkost@rkost-arch-p71 ~/repos/rust-battop master coredumpctl dump battop -o battop.dump
PID: 15024 (battop)
UID: 1000 (rkost)
GID: 100 (users)
Signal: 6 (ABRT)
Timestamp: Sun 2019-06-02 15:40:22 CEST (16s ago)
Command Line: ./battop
Executable: /home/rkost/repos/rust-battop/target/debug/battop
Control Group: /user.slice/user-1000.slice/session-1.scope
Unit: session-1.scope
Slice: user-1000.slice
Session: 1
Owner UID: 1000 (rkost)
Boot ID: 1ecfb713969249499bc3d20e0c424b50
Machine ID: 9799e596c3c24aed8df7672be00906b3
Hostname: rkost-arch-p71
Storage: /var/lib/systemd/coredump/core.battop.1000.1ecfb713969249499bc3d20e0c424b50.15024.1559482822000000.lz4
Message: Process 15024 (battop) of user 1000 dumped core.
Stack trace of thread 15024:
#0 0x00007fdae318d82f raise (libc.so.6)
#1 0x00007fdae3178672 abort (libc.so.6)
#2 0x000056176cc055b7 n/a (/home/rkost/repos/rust-battop/target/debug/battop)
More than one entry matches, ignoring rest.
Awesome response time btw ;)
Awesome response time btw ;)
Accidental luck, I suppose :)
Could you also attach the exact /home/rkost/repos/rust-battop/target/debug/battop
binary, which had produced that dump? It seems that our compilers suite are different, because I was not able to get anything useful from the core dump yet. That's what I got at the moment with help of gdb
and lldb
:
(gdb) bt full
#0 0x00007fdae318d82f in ?? ()
No symbol table info available.
#1 0x0000000000000400 in ?? ()
No symbol table info available.
#2 0x000056176cc196bb in ?? ()
No symbol table info available.
#3 0x0000000000000000 in ?? ()
No symbol table info available.
(lldb) bt
* thread #1, name = 'battop', stop reason = signal SIGABRT
* frame #0: 0x00007fdae318d82f
frame #1: 0x00007fdae300c1a0
(lldb) f 0
frame #0: 0x00007fdae318d82f
error: read memory from 0x7fdae318d82f failed
(lldb) f 1
frame #1: 0x00007fdae300c1a0
-> 0x7fdae300c1a0: addl %eax, (%rax)
0x7fdae300c1a2: addb %al, (%rax)
0x7fdae300c1a4: addb %al, (%rax)
0x7fdae300c1a6: addb %al, (%rax)
These in ?? ()
lines are not helpful, and locally generated battop
binary seems to have the different memory layout, so I have no idea yet what causes the panic.
I'll try to check the Thinkpad driver sources now, maybe there is some unobvious behavior in it.
So far I had not found anything interesting, except the thing that upower
does an additional check before reading the /sys/class/power_supply/
files — it checks if it is a regular file, while battop
does not care about it and this might be a problem if device driver decides to mess everything up during the battery unplugging.
It still does not explain the stack overflow and I doubt that this is a reason, since buffer for a file reading is located at heap, but it is worth to check it out.
May I also ask you to run that command before and after the battery swap? Obviously, if you are okay with that.
$ find -L /sys/class/power_supply -maxdepth 2 -printf "\n%p\n" -exec cat {} \;
Amazing, thank you very much! That backtrace is really helpful, here is a short explanation for this bug:
battop
is backed by a battery cratebattery
is creating astruct
for each battery found, which represents the battery device and stores the path to a/sys/class/power_supply/*
folder- This struct is cached by
battop
and information in it is refreshed each second (except the path, which is "constant", since batteries are just plugged in usually ;) ) - When you are unplugging the battery, Linux removes the respective
/sys/class/power_supply/*
folder, butbattop
still has the cached path, so.. - When it tries to read the new information, there are no files exists and due to another well hidden bug at these lines it falls into a infinite recursion :|
This was an interesting investigation, I should say :) Thank you once again for your invaluable help!
I've created the issues to handle these bugs and will try to fix them and publish the new battop
version tomorrow:
I'm sorry for a delay.
So, all related bugs were fixed (and it seems that they were caused by a stupid copy-n-paste error) and necessary changes were made in the issue-8 branch.
I'm attaching the "debug" version build from this branch, if you want to test it out: battop.gz, or, as usual, with a git, you can check out the issue-8
branch and build it manually.
Since the bug should be covered now, I'll wait till tomorrow and publish the new version then.
UPD: With a current behavior battop
will close gracefully with an exit code > 0, which it still better than panicking and messing up the terminal :)
Nice! Can confirm that battop will just close with some exit code != 0. I guess an error message would be nice for the future but for now this is totally fine. Thanks for fixing!
You are absolutely right, this is not the best user experience, but I'm not sure yet how exactly should battop
behave in this case. I had started working on this, see the svartalf/rust-battery#31 issue, so it will be better in the next versions.
Thank you for the bug report and the testing, there would be no chance I would fix that alone :)
Release process is finished already and AUR package is up to date now.