lwfinger/rtw89

4k lines "SER catches error" make system hang for a couple of minutes

ikus060 opened this issue · 3 comments

I'm reporting an issue with rtw89. From time to time, the system completely hang for a while.

dmesg display 4 thousand lines "SER catches error"

Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x10000000004 flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x1000000001c flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x10000000054 flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x1000000008c flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x100000000c4 flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0xfffffffffc flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x10000000024 flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x10000000034 flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x10000000034 flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0002 address=0x1000000003c flags=0x0030]
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: FW status = 0xd9001100
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: FW BADADDR = 0x0
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: FW EPC/RA = 0x0
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: FW MISC = 0x0
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_HALT_C2H = 0x1000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_SER_DBG_INFO = 0x1000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x20119f78
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x2011a02c
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x2011a02c
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x20119f9c
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x20119f7c
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x20119f72
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x20119f96
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x2011a03a
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x20119f72
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x20119f94
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x2000c1f6
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x2011a02e
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x2011a016
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x20119e64
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: [ERR]fw PC = 0x2011bd14
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: --->
                               err=0x1000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_SER_DBG_INFO =0x01000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_DMAC_ERR_ISR=0x00004000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_DMAC_ERR_IMR=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_WDE_ERR_FLAG_CFG=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_PLE_ERR_FLAG_CFG=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_PLE_ERRFLAG_MSG=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_WDE_ERRFLAG_MSG=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_PLE_DBGERR_LOCKEN=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_PLE_DBGERR_STS=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_HAXIDMA_ERR_IMR=0x000000ff
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_HAXIDMA_ERR_ISR=0x00000002
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_CMAC_ERR_ISR [0]=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_CMAC_FUNC_EN [0]=0xf000803f
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_CK_EN [0]=0xffffffff
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_CMAC_ERR_IMR [0]=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_RPQ_RXBD_IDX =0xdeadbeef
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_DBG_ERR_FLAG=0x00000000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: R_AX_LBC_WATCHDOG=0xdeadbeef
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: <---
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: SER catches error: 0x1000
Dec 17 15:11:50 pop-os kernel: rtw89_8852ce 0000:04:00.0: SER catches error: 0x2599

[...]
[  450.084276] rtw89_8852ce 0000:04:00.0: SER catches error: 0x2599
[  450.084282] rtw89_8852ce 0000:04:00.0: SER catches error: 0x2599
[  450.084288] rtw89_8852ce 0000:04:00.0: SER catches error: 0x2599
[  450.084294] rtw89_8852ce 0000:04:00.0: SER catches error: 0x2599
[  450.084304] rtw89_8852ce 0000:04:00.0: SER catches error: 0x2599
[  450.084313] rtw89_8852ce 0000:04:00.0: SER catches error: 0x2599
[  450.084319] rtw89_8852ce 0000:04:00.0: SER catches error: 0x2599
[  450.086061] ieee80211 phy0: Hardware restart was requested
[  450.086063] rtw89_8852ce 0000:04:00.0: sec cam entry is empty

Eventually, the system seams to reset the hardware and make it work again.

I've already disable power-save mode as follow:

$ cat /etc/modprobe.d/rtw8852be.conf 
options rtw89_pci disable_aspm_l1=y disable_aspm_l1ss=y
options rtw89pci disable_aspm_l1=y disable_aspm_l1ss=y
options rtw89_core disable_ps_mode=y
options rtw89core disable_ps_mode=y

This happen on Linux 6.6.6. Was on another version for weeks without this problem. So it's possibly a regression introduce in 6.6.6. Will need to test with previous kernel.

uname -a
Linux pop-os 6.6.6-76060606-generic #202312111032~1702306143~22.04~d28ffec SMP PREEMPT_DYNAMIC Mon D x86_64 x86_64 x86_64 GNU/Linux
[2215864.006181] rtw89_8852be 0000:2d:00.0: R_AX_CK_EN [0]=0xffffffff
[2215864.006185] rtw89_8852be 0000:2d:00.0: R_AX_CMAC_ERR_IMR [0]=0x00000000
[2215864.006189] rtw89_8852be 0000:2d:00.0: R_AX_RPQ_RXBD_IDX =0x007d007d
[2215864.006192] rtw89_8852be 0000:2d:00.0: R_AX_DBG_ERR_FLAG=0x00000000
[2215864.006196] rtw89_8852be 0000:2d:00.0: R_AX_LBC_WATCHDOG=0x00000081
[2215864.006196] rtw89_8852be 0000:2d:00.0: <---
[2215864.006197] rtw89_8852be 0000:2d:00.0: SER catches error: 0x1000
[2215864.056545] rtw89_8852be 0000:2d:00.0: rtw89: failed to leave lps state
[2215864.059348] rtw89_8852be 0000:2d:00.0: FW status = 0xa001100
[2215864.059457] rtw89_8852be 0000:2d:00.0: FW BADADDR = 0x77
[2215864.059465] rtw89_8852be 0000:2d:00.0: FW EPC/RA = 0x0
[2215864.059470] rtw89_8852be 0000:2d:00.0: FW MISC = 0xb898828b
[2215864.059474] rtw89_8852be 0000:2d:00.0: R_AX_HALT_C2H = 0x1001
[2215864.059477] rtw89_8852be 0000:2d:00.0: R_AX_SER_DBG_INFO = 0xf8000001
[2215864.059486] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a007
[2215864.059513] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a0eb
[2215864.059617] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a0d1
[2215864.059644] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a0eb
[2215864.059671] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a0d3
[2215864.059698] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a011
[2215864.059725] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a0f1
[2215864.059752] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a033
[2215864.059779] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a01d
[2215864.059806] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a007
[2215864.059909] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a00f
[2215864.059936] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a021
[2215864.060039] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a105
[2215864.060066] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a017
[2215864.060169] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb898a01f
[2215864.060179] rtw89_8852be 0000:2d:00.0: SER catches error: 0x1001
[2215864.060492] rtw89_8852be 0000:2d:00.0: FW status = 0xa008100
[2215864.060498] rtw89_8852be 0000:2d:00.0: FW BADADDR = 0x77
[2215864.060502] rtw89_8852be 0000:2d:00.0: FW EPC/RA = 0x0
[2215864.060506] rtw89_8852be 0000:2d:00.0: FW MISC = 0xb898828b
[2215864.060510] rtw89_8852be 0000:2d:00.0: R_AX_HALT_C2H = 0x1002
[2215864.060513] rtw89_8852be 0000:2d:00.0: R_AX_SER_DBG_INFO = 0xf8000001
[2215864.060522] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb890cdeb
[2215864.060550] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89a79ff
[2215864.060576] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb8935ecf
[2215864.060603] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89a9053
[2215864.060630] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb897d195
[2215864.060644] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89acc9d
[2215864.060672] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89a9743
[2215864.060693] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb8935ebb
[2215864.060720] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89a37e5
[2215864.060747] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb892ceaf
[2215864.060774] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89360db
[2215864.060801] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89a980d
[2215864.060828] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89a9713
[2215864.060856] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89a3191
[2215864.060883] rtw89_8852be 0000:2d:00.0: [ERR]fw PC = 0xb89a9249
[2215864.060894] rtw89_8852be 0000:2d:00.0: SER catches error: 0x1002
[2215864.071274] rtw89_8852be 0000:2d:00.0: c2h class 1 func 3 not support
uname -a
Linux bit01 6.2.16 #1 SMP PREEMPT_DYNAMIC Tue Oct 17 12:34:47 CST 2023 x86_64 x86_64 x86_64 GNU/Linux

I encountered the same issue on a Lenovo laptop with Ubuntu 6.2.16.

I upgraded the kernel to 6.6.7, and the problem seems to have disappeared. I've also disabled ASPM.

cat /etc/modprobe.d/rtw8852be.conf
# set options for faulty HP and Lenovo BIOS code
options rtw89_pci disable_aspm_l1=y disable_aspm_l1ss
options rtw89pci disable_aspm_l1=y disable_aspm_l1ss
options rtw89_core disable_ps_mode=y
options rtw89core disable_ps_mode=y
uname -a
Linux bit01 6.6.7-060607-generic #202312131837 SMP PREEMPT_DYNAMIC Wed Dec 13 19:05:40 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

If you want to disable "disable_aspm_l1ss", then you need to add "=y" to the end of the line, but it does not matter as the SER lines are gone.