suspend / dpms via swayidle causing high CPU usage / instability in firefox?
t3e opened this issue · 12 comments
I'm using
- debian 12
- sway 1.7 (version from debian repo)
- swayidle 1.8 (debian repo)
- firefox 123.0.1 (from mozilla apt source)
The issue I'm facing is that when I call
systemctl suspend
as a timeout command like in this example:
exec swayidle -w \
timeout 10 'swaylock -f -c 000000' \
timeout 20 'swaymsg "output * dpms off"' resume 'swaymsg "output * dpms on"' \
timeout 30 'systemctl suspend' resume 'swaymsg "output * dpms on"' \
before-sleep 'swaylock -f -c 000000'
... then I noticed a strange phenomenon with firefox after the wakeup. It predictably happens everytime I execute this with an opened browser window, after resuming I see one CPU core beeing fully utilized by a firefox process, even when there was nearly 0 activity before and it stays this way. Sometimes I even had crashes with firefox afterwards.
It does not happen when i call systemctl suspend
(without using swayidle), so I suspect there must be some strange interference.
I also noticed one time than even when I close and reopen firefox, the cpu fan doesn't slow down, even when there's no more activity, which is really beyond strange and hard for me to understand...
I have come to the conclusion that it must be related to swayidle or some underlying system that it uses (and probably also firefox itself but since it so far only occurs with swayidle I think it might be the primary culprit).
This happens also with a "blank" profile in firefox without any extensions. When opening about:performance it is the main thread in firefox that is using 100% cpu time.
As a side note: What I also remember is sway segfaulting at one point (with complete crash back to tty). Unfortunately I can't remember how that happend and I'm unable to reproduce it.
The issue is very unlikely to be swayidle itself - it more or less just runs specified commands in order.
One notable difference between just running systemctl suspend
and your swayidle setup is that your swayidle setup locks the screen, then disables outputs, then suspends, then enables outputs. If you just ran those things through a script with equivalent sleeps in between, you'll likely get the same result.
It turns out you're absolutely right, the following 2 lines are enough to reproduce the error
systemctl suspend
swaymsg "output * dpms on"
So i guess this could be an issue in sway itself? Or firefox?
After reading a bit through the error reports on sway I'm starting to suspect it's a specific driver issue with my hardware.
I have AMD integrated graphics (Ryzen 5 3400g) and for sway there are some reports about instabilities with similar platforms. I didn't have any instabilities or sililar issues before switching to sway though.
Unless Firefox burns CPU because sway spams it with traffic, the issue would be within Firefox. However, "within Firefox" could be caused by mesa (your user-space GPU driver).
htop
can show which thread is burning CPU (H
to toggle threads, t
to toggle tree mode for organization, F2
-> Display options
-> Show custom thread names
to make the thread purpose clear), might give a clue. WAYLAND_DEBUG=1
can be used to see what happens on the wire, but this will be noisy and requires a strict reproduction with no other interaction to be useful.
That issue was closed after discovering it was caused by faulty RAM.
I haven't tried this yet but I've made an interesting discovery:
When I execute
systemctl suspend
swaymsg "output [monitor1] dpms on"
swaymsg "output [monitor2] dpms on"
swaymsg "output [monitor3] dpms on"
then the problem does not occur!
So the issue is probably somehow related to the wildcard for the outputs...
That may simply be timing. I still mainly suspect Firefox, as that is where your symptoms are.
You're apparently right about the timing because it does occur when I do this
systemctl suspend
swaymsg "output [monitor1] dpms on, output [monitor2] dpms on, output [monitor3] dpms on"
And I also found out that has nothing to do with suspend at all (!), just
swaymsg "output [monitor1] dpms on, output [monitor2] dpms on"
(dpms signal to more than one output) is enough to reproduce the error (even if the monitors are already on).
The firefox thread name in htop (with 100% CPU usage) is called "WaylandProxy"
In that case, it just sounds like a bug in Firefox' WaylandProxy component - file a bug with them and see.
Closing here as it is unrelated to swayidle.
Here I don't agree as I've now recorded the wayland messages as you suggested and the protocoll looks like this.
[1122210.849] -> wl_keyboard#24.repeat_info(35, 190)
[1122210.856] -> wl_keyboard#24.modifiers(472, 0, 0, 0, 0)
[1122210.865] -> wl_keyboard#24.leave(473, wl_surface#3)
[1122210.874] -> wl_keyboard#24.enter(474, wl_surface#3, array[0])
[1122210.881] -> wl_keyboard#24.modifiers(475, 0, 0, 0, 0)
[1122210.887] -> wl_data_device#15.selection(nil)
[1122210.894] -> zwp_primary_selection_device_v1#16.selection(nil)
[1122218.421] -> wl_keyboard#9.keymap(1, fd 195, 66082)
[1122218.442] -> wl_keyboard#9.repeat_info(35, 190)
[1122218.449] -> wl_keyboard#47.keymap(1, fd 196, 66082)
[1122218.474] -> wl_keyboard#9.keymap(1, fd 112, 66082)
[1122218.480] -> wl_keyboard#9.repeat_info(35, 190)
[1122218.486] -> wl_keyboard#24.keymap(1, fd 113, 66082)
[1122218.491] -> wl_keyboard#24.repeat_info(35, 190)
[1122218.496] -> wl_keyboard#24.keymap(1, fd 118, 66082)
[1122218.502] -> wl_keyboard#24.repeat_info(35, 190)
[1122218.507] -> wl_keyboard#24.keymap(1, fd 119, 66082)
[1122218.514] -> wl_keyboard#24.repeat_info(35, 190)
------------------------------------------------------------------------------------------------------
Here i executed:
swaymsg "output [monitor1] dpms on, output [monitor2] dpms on"
------------------------------------------------------------------------------------------------------
Error: WaylandMessage::Write() too many files to send
:
[1122218.521] -> wl_keyboard#24.modifiers(476, 0, 0, 0, 0)
Error: WaylandMessage::Write() too many files to send
:
[1122218.551] -> wl_keyboard#24.leave(477, wl_surface#3)
Error: WaylandMessage::Write() too many files to send
:
[1122218.563] -> wl_keyboard#24.enter(478, wl_surface#3, array[0])
Error: WaylandMessage::Write() too many files to send
:
[1122218.571] -> wl_keyboard#24.modifiers(479, 0, 0, 0, 0)
Error: WaylandMessage::Write() too many files to send
:
[1122218.582] -> wl_data_device#15.selection(nil)
Error: WaylandMessage::Write() too many files to send
:
[1122218.602] -> zwp_primary_selection_device_v1#16.selection(nil)
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
:
Error: WaylandMessage::Write() too many files to send
After the issue occures wayland seems to send an insane amount of messages, the filesize of the log grew very quickly.
This may not be related to swayidle, but I don't think the problem here is firefox.
Ok maybe it is related to firefox, I'm not shure if I understand the output correctly (looks like firefox is producing the messages?)
https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/2051838
https://bugzilla.mozilla.org/show_bug.cgi?id=1882449
Here I don't agree
WaylandProxy is a small component Firefox uses to buffer messages. If it is written well and does not have any active bugs, then for sway to cause 100% CPU load in WaylandProxy, sway would also nneed to consume somewhere around 100% CPU load to send that much traffic to it. As such, I consider the 100% CPU load a bug in WaylandProxy until proven otherwise.
At the same time, you have error logs from said component, stating it has "too many files to send" in one direction or the other. As it received said files, the message on the wire was valid, and so it's very likely that WaylandProxy's wire protocol implementation has tripped over itself.
Please use the Firefox bug tracker to discuss Firefox issues.
Ok then, thanks for the explanation. According to the links the problem has already been fixed in version 125 of Firefox which is planned to be released next month.