kaloz/mwlwifi

WPA3 could crash/hang the router (WRT3200ACM)

lss4 opened this issue ยท 43 comments

lss4 commented

I'm currently trying OpenWrt master on this router and during the process I found that using WPA3 could crash the entire router. It seems to happen when the device leaves the access point (disconnecting or switching to another AP, preferrably on the same device), or when I restart the wireless adapter after changing some settings.

During the process, at first the wireless adapter stops working (all APs disappear), and LuCI starts to become unresponsive. After a few minutes all internet connectivites were cut and LuCI stopped working completely (timeout). At least I can SSH into the router and manually reboot it.

When the problem starts to happen, following messages can be seen in the system log.

[  275.710196] ieee80211 phy1: cmd 0x9122=UpdateEncryption timed out
[  275.716330] ieee80211 phy1: return code: 0x1122
[  275.720879] ieee80211 phy1: timeout: 0x1122
[  275.725087] wlan1: failed to remove key (0, 80:a5:89:c7:e9:53) from hardware (-5)
[  275.732747] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  295.734426] ieee80211 phy1: cmd 0x9111=SetNewStation timed out
[  295.740288] ieee80211 phy1: return code: 0x1111
[  295.744848] ieee80211 phy1: timeout: 0x1111
[  295.752663] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
[  315.745685] ieee80211 phy1: cmd 0x801d=MEMAddrAccess timed out
[  315.751554] ieee80211 phy1: return code: 0x001d
[  315.756103] ieee80211 phy1: timeout: 0x001d
[  315.760317] ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000

The router is mostly unresponsive and these log entries are being generated very slowly. It appears as if everything regarding wireless is timing out and is blocking the router, causing certain functionalities to slow down or stop responding.

Switching all APs back to WPA2 and the issues went away. I did not know mwlwifi had issues with WPA3 until recently when I was writing about issues I encountered with this router on OpenWrt master during build and installation processes... so for now, it seems WPA3 is not a good idea...

This is due to 802.11w (PMF / Protected Management Frames) is required in WPA3. In WPA2, it is optional, and if you enable that in WPA2, it would have the same issue.

The 802.11w issue is quite well known, and currently unsolvable without the wireless adapter firmware being open sourced. I would like to see that one day. Hopefully we can do something, so that Linksys / NXP would like to open source that.

lss4 commented

From what are being discussed in this PR... it seems this is not the only issue with these routers' wireless...

Sadly, I don't think getting wireless firmware open-sourced is possible in the near future. From what I saw, wireless is still a very closed ecosystem with all kinds of proprietary technologies and very limited open source contributions from the vendors (and thus ndiswrapper existed to allow some Windows wireless drivers to work on other platforms to a limited extent).

I got the router a few years ago, because at that time it was believed to be the best router ever for the purpose of OpenWrt, with sufficient wireless support. Guess that turned out to be (not entirely) wrong...

Currently I can still use the router without major issues, including 2.4GHz and 5GHz. It's just that I wouldn't dare touching the configurations ever again because every time I touch it I risk getting all APs disabled and it's also very difficult to bring them back up if that happened. From my experience, re-enabling the adapters is not just a simple start/enable after they become disabled.

Just popping by to note that this is still an issue in the 21.02 RC software. This is a bigger issue now as WPA3 is starting to appear in LuCI defaults, and I had a bit of a headscrater resolving this. Changing wireless security to WPA2 only resolves the issue.

Just popping by to note that this is still an issue in the 21.02 RC software. This is a bigger issue now as WPA3 is starting to appear in LuCI defaults, and I had a bit of a headscrater resolving this. Changing wireless security to WPA2 only resolves the issue.

since i dont have this issue in dd-wrt with wpa3, but willing to help here. can you provide crash logs? (kernel)
one thing need to be considered. openwrt has as far as i know stripped out some softcrypt algorithms from mac80211. this is a big problem for some chipsets and leads to non working wpa3 support. the aes_gmac support in mac80211 is not compiled in in openwrt and not supported by openwrt. this could be the cause.

@BrainSlayer sadly, I rolled back to 19.x firmware becuase I was still having hanging issues. This was at 4AM local time and I threw in the towel. If its useful, I could try and purposefully induce the issue this weekend and pull kernel logs.

Edit: and for the record, if you are talking about /sys/kernel/debug/crashlog, there wasn't one. Ethernet internet remained working; its just DNS, the LuCI web UI, and wireless would all go down. I could visit direct IPs or cached DNS sites over wired connections just fine. I could also still SSH into the router.

I further tried turning on PMF on 19.x and started seeing the hangs, so that is definitely the issue.

Just popping by to note that this is still an issue in the 21.02 RC software. This is a bigger issue now as WPA3 is starting to appear in LuCI defaults, and I had a bit of a headscrater resolving this. Changing wireless security to WPA2 only resolves the issue.

since i dont have this issue in dd-wrt with wpa3, but willing to help here. can you provide crash logs? (kernel)
one thing need to be considered. openwrt has as far as i know stripped out some softcrypt algorithms from mac80211. this is a big problem for some chipsets and leads to non working wpa3 support. the aes_gmac support in mac80211 is not compiled in in openwrt and not supported by openwrt. this could be the cause.

Interesting to see it working in ddwrt, didn't expect that... I am not sure if the softcrypt algorithms you mentioned are related to the 802.11w (PMF / Protected Management Frames), if it is then this is huge!

Just a hypothesis without any support, see if anybody can verify: I am thinking if there are some problematic crypto operations in the closed-source firmware causing the unresponsive behaviours. The mentioned softcrypt algorithms in mac80211 could be a walkaround for such issues and make it work. Well, just a hypothesis, hopefully someone has more rigid info.

Just popping by to note that this is still an issue in the 21.02 RC software. This is a bigger issue now as WPA3 is starting to appear in LuCI defaults, and I had a bit of a headscrater resolving this. Changing wireless security to WPA2 only resolves the issue.

since i dont have this issue in dd-wrt with wpa3, but willing to help here. can you provide crash logs? (kernel)
one thing need to be considered. openwrt has as far as i know stripped out some softcrypt algorithms from mac80211. this is a big problem for some chipsets and leads to non working wpa3 support. the aes_gmac support in mac80211 is not compiled in in openwrt and not supported by openwrt. this could be the cause.

Interesting to see it working in ddwrt, didn't expect that... I am not sure if the softcrypt algorithms you mentioned are related to the 802.11w (PMF / Protected Management Frames), if it is then this is huge!

Just a hypothesis without any support, see if anybody can verify: I am thinking if there are some problematic crypto operations in the closed-source firmware causing the unresponsive behaviours. The mentioned softcrypt algorithms in mac80211 could be a walkaround for such issues and make it work. Well, just a hypothesis, hopefully someone has more rigid info.

no. its just the missing gmac implementation in mac80211 in openwrt. in dd-wrt i readded the support for softcrypto gmac. this is required for hostapd / wpa_supplicant authentication only and has nothing todo with the hardware. so you have to mod mac80211 in openwrt. or basicly you have to remove the patch which stripps out the support in mac80211. then you need to add gmac crypto in your kernel config. thats all. and yes its related to PMF too. its a generic bug in openwrt for alot of chipsets.

In this case, can we explain this in the OpenWRT bug tracker, so they can perhaps re-add the necessary configuration? As of now I have only WPA2 on a config from a previous build of OpenWRT that blows up any time I try to edit it.

i already talked with a openwrt team member to take care about that issue. i told them about this already 2 years ago. you can also fix it by yourself. just delete 100-remove-cryptoapi-dependencies.patch from packages/mac80211/subsys and then take care to include the neccessary cryptoapi modules in your kernel. CRYPTO_CCM, CRYPTO_GCM etc. this should be all.

i already talked with a openwrt team member to take care about that issue. i told them about this already 2 years ago. you can also fix it by yourself. just delete 100-remove-cryptoapi-dependencies.patch from packages/mac80211/subsys and then take care to include the neccessary cryptoapi modules in your kernel. CRYPTO_CCM, CRYPTO_GCM etc. this should be all.

cool! but anyone can explain why the log is filled with a bunch of timeouts when this problem exists?

t-m-w commented

This could be great news. I haven't tried it because re-compiling is more than I'm up for at the moment, but if it works (can anyone else confirm?), I think adding this to the official OpenWRT bug tracker is worth a try, to keep a public record if nothing else. I couldn't find a reference to this fix in there anywhere โ€” not even many issues about WPA3 or 802.11w at all, and most have responses saying the user's device doesn't actually use the mwlwifi driver, with no further activity.

So, I was able to recompile and make use of mixed WPA2/WP3 for a brief period

I had to remove three patches
131-Revert-mac80211-aes-cmac-switch-to-shash-CMAC-driver.patch
132-mac80211-remove-cmac-dependency.patch
100-remove-cryptoapi-dependencies.patch

together with adding the crypto modules

after reboot I was able to connect with WPA3 ... but after awhile started to get a number of messages like
wlan1: failed to remove key (0, 48:a4:72:5d:86:35) from hardware (-5)
ieee80211 phy1: MACREG_REG_INT_CODE: 0x0000
ieee80211 phy1: cmd 0x9111=SetNewStation timed out

quite a lot of wlan0: AP-STA-POSSIBLE-PSK-MISMATCH although the key is correct

reverted back to previous build. So ... I feel there is more to do than simply remove the patches and add the crypto modules

the timeout message indicates that your wifi chipset crashed, but mwlwifi hasnt been changed for a long time. i also cannot reproduce this on my device. are there any special patches for mwlwifi in your build? if yes, remove them and start from scratch

Hi @BrainSlayer!
We need to add to hostapd.conf only these keys?
wpa_key_mgmt=SAE
ieee80211w=2

i'm using

ieee80211w=2
sae_require_mfp=1
sae_password=*******
wpa_key_mgmt=SAE
sae_groups=19 20 21
wpa_pairwise=CCMP
group_mgmt_cipher=AES-128-CMAC
okc=1

wpa_psk instead of sae_password works too, but is more restrictive with key length. its usefull for mixed operation

wpa_pairwise and group_mgmt_cipher depends on the values your card supports. the wrt3200acm is able todo gmp-256 for instance. but not the wrt1900

Thank you, these are valuable additions.

The patches regarding the stripped crypto support were removed with this commit on the master branch:

openwrt/openwrt@53b6783#diff-0621500c052fa8b9db44fcc11024ad845bd148fb25c3e41417afa43ac93d7898

The patches regarding the stripped crypto support were removed with this commit on the master branch:

openwrt/openwrt@53b6783#diff-0621500c052fa8b9db44fcc11024ad845bd148fb25c3e41417afa43ac93d7898

WOW! Can't wait to try!

I removed the patches and built the kernel (v19.07.8). The connection succeeds, but then after a few minutes it disconnects, but the kernel does not crash. dmesg output:

[   85.780313] ieee80211 phy0: change: 0x100
[   85.789309] ieee80211 phy0: change: 0x100
[   85.798334] ieee80211 phy0: change: 0x40
[   86.004330] ieee80211 phy0: change: 0x40
[   86.214403] ieee80211 phy0: change: 0x40
[   86.310407] ieee80211 phy0: change: 0x100
[   86.339948] ieee80211 phy0: change: 0x100
[   86.348988] ieee80211 phy0: change: 0x42
[   86.516462] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[   86.522911] br-lan: port 2(wlan0) entered blocking state
[   86.528258] br-lan: port 2(wlan0) entered forwarding state
[  223.680183] ieee80211 phy1: change: 0xffffffff
[  223.767397] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
[  223.774585] br-lan: port 3(wlan1) entered blocking state
[  223.779947] br-lan: port 3(wlan1) entered disabled state
[  223.785396] device wlan1 entered promiscuous mode
[  223.808049] ieee80211 phy1: change: 0x100
[  223.817094] ieee80211 phy1: change: 0x42
[  223.991302] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready
[  223.997783] br-lan: port 3(wlan1) entered blocking state
[  224.003129] br-lan: port 3(wlan1) entered forwarding state
[  237.864110] IPv6: ADDRCONF(NETDEV_UP): wlan2: link is not ready
[  237.871889] br-lan: port 4(wlan2) entered blocking state
[  237.877251] br-lan: port 4(wlan2) entered disabled state
[  237.882758] device wlan2 entered promiscuous mode
[  237.887594] br-lan: port 4(wlan2) entered blocking state
[  237.892939] br-lan: port 4(wlan2) entered forwarding state
[  238.502206] IPv6: ADDRCONF(NETDEV_CHANGE): wlan2: link becomes ready
[  759.379451] ieee80211 phy0: cmd 0x9122=UpdateEncryption timed out
[  759.385581] ieee80211 phy0: return code: 0x1122
[  759.390130] ieee80211 phy0: timeout: 0x1122
[  759.394340] wlan0: failed to remove key (0, 06:d7:1e:8e:cb:2b) from hardware (-5)
[  759.429768] ieee80211 phy0: MACREG_REG_INT_CODE: 0x0000
[  779.430437] ieee80211 phy0: cmd 0x9111=SetNewStation timed out
[  779.436311] ieee80211 phy0: return code: 0x1111
[  779.440867] ieee80211 phy0: timeout: 0x1111
[  779.446599] ieee80211 phy0: MACREG_REG_INT_CODE: 0x0000
[  799.454243] ieee80211 phy0: cmd 0x801d=MEMAddrAccess timed out
[  799.460111] ieee80211 phy0: return code: 0x001d
[  799.464660] ieee80211 phy0: timeout: 0x001d
[  799.468870] ieee80211 phy0: MACREG_REG_INT_CODE: 0x0000

Does the kernel for 21.02.0 have the patch removed?

@dxgldotorg from my empirical testing I don't think so, I loaded 21.02 onto my WRT3200 and very shortly after enabling wireless saw the hanging behavior that I'd seen in the test branch

Hi,

I managed to get rid of the timeout issue on a WRT-32X. Also when WP3 is enabled (mixed mode with WPA2). I created a pull request for it.

Hope it fixes the issues for other users.

Formerly the diff was a patch for returning the correct result code from the WiFi chip, but a side effect is, that it fixed the WiFi issue on my router. But there is still another issue in the driver:

[ 1270.545296] ------------[ cut here ]------------
[ 1270.549972] WARNING: CPU: 0 PID: 19 at target-arm_cortex-a9+vfpv3-d16_musl_eabi/linux-mvebu_cortexa9/mwlwifi-2021-09-15-4adce307/mac80211.c:841 mwl_mac80211_ampdu_action+0x354/0x3d8 [mwlwifi]
[ 1270.567095] Modules linked in: pppoe ppp_async iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_table nf_conntrack ipt_REJECT xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc rfcomm nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 mwifiex_sdio mwifiex iptable_mangle iptable_filter ip_tables hidp hci_uart crc_ccitt btusb btmrvl_sdio btmrvl btintel bnep bluetooth hid evdev input_core mwlwifi mac80211 cfg80211 compat nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ecdh_generic ecc sha256_generic libsha256 seqiv jitterentropy_rng drbg kpp hmac ghash_generic ghash_arm_ce gf128mul gcm ecb ctr cmac ccm gpio_button_hotplug
[ 1270.638183] CPU: 0 PID: 19 Comm: kworker/u4:1 Not tainted 5.4.143 #0
[ 1270.644563] Hardware name: Marvell Armada 380/385 (Device Tree)
[ 1270.650566] Workqueue: phy1 ieee80211_ba_session_work [mac80211]
[ 1270.656609] [<c010eeac>] (unwind_backtrace) from [<c010b018>] (show_stack+0x10/0x14)
[ 1270.664391] [<c010b018>] (show_stack) from [<c0730624>] (dump_stack+0x94/0xa8)
[ 1270.671650] [<c0730624>] (dump_stack) from [<c01278c0>] (__warn+0xbc/0xd8)
[ 1270.678556] [<c01278c0>] (__warn) from [<c012792c>] (warn_slowpath_fmt+0x50/0x94)
[ 1270.686082] [<c012792c>] (warn_slowpath_fmt) from [<bf1c5bac>] (mwl_mac80211_ampdu_action+0x354/0x3d8 [mwlwifi])
[ 1270.696341] [<bf1c5bac>] (mwl_mac80211_ampdu_action [mwlwifi]) from [<bf13e77c>] (ieee80211_request_smps_mgd_work+0x380/0x3d0 [mac80211])
[ 1270.708791] [<bf13e77c>] (ieee80211_request_smps_mgd_work [mac80211]) from [<bf13e0b0>] (ieee80211_ba_session_work+0x2b4/0x2b8 [mac80211])
[ 1270.721308] [<bf13e0b0>] (ieee80211_ba_session_work [mac80211]) from [<c013f8b8>] (process_one_work+0x218/0x470)
[ 1270.731529] [<c013f8b8>] (process_one_work) from [<c013fb54>] (worker_thread+0x44/0x5dc)
[ 1270.739657] [<c013fb54>] (worker_thread) from [<c01452ec>] (kthread+0x14c/0x150)
[ 1270.747086] [<c01452ec>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[ 1270.754339] Exception stack(0xdf4f5fb0 to 0xdf4f5ff8)
[ 1270.759412] 5fa0:                                     00000000 00000000 00000000 00000000
[ 1270.767624] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1270.775837] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[ 1270.782492] ---[ end trace b8a7957484f675e5 ]---

I can confirm that trying to use WPA3 on a new WRT3200ACM will cause the router to halt, and need rebooted. WPA2 without the 802.11w (PMF / Protected Management Frames) option works correctly.

SebTM commented

@krjdev Did you find anything else that helped?

I can confirm the patch is not in OpenWRT 21.02.1.

I can confirm the patch is not in OpenWRT 21.02.1.

Thanks for checking. Unless I see a specific CVE in 19.x I'm keeping my AP there for the time being.

21.02.2 Seems to be okay now, at least for WPA2 only. Given its release today, I figured I'd put it out there. https://lwn.net/Articles/886489/

I've been using the rc that is essentially identical to the official release for a couple weeks with no crashes.

The Wiki also specifically calls out a crash fix for 20.02 branches: https://openwrt.org/toh/linksys/wrt3200acm#notes

OpenWRT 21.02.2 on WRT3200ACM works with WPA2 but not WPA3. One strange problem is that hostapd would apparently fail to change the hardware mode -back- to WPA2 and need to fully reboot the device to get it into WPA2 mode. NetworkManager clients would report that the device required a PSK but could not support it when in WPA3 mode. WPA2 works fine on all client OSes I've tried.

t-m-w commented

To clarify: In 21.02.2, nothing has been fixed related to this issue, at least for me.

Even with WPA2, 802.11w still causes the same behavior as it always has for me -- hanging. And because WPA3 requires 802.11w, it also still hangs the router, as it always has.

Aghh this has been causing me so much grief! I just upgraded from 19.07.8 to 21.02.3 and encountered the same issue. I attempted an upgrade to 21.02.0 earlier in the year and put it down to it being a bug in openwrt that would get fixed in a few point releases.

Any word whether it is fixed in 22.03.0-rc1?

https://openwrt.org/releases/22.03/changelog-22.03.0-rc1

According to the change log this is not mentioned. This will have to be fixed in the firmware and what I understand the firmware is closed source and there has not been any development on the firmware for a while. It looks like there are no options but to purchase another device that is better supported by the vendor.

Bummer

I saw that the 22.03.0-rc1 kernel has softcrypt enabled (mac80211 changelog) so I tried it out on my WRT3200ACM. With WPA3 enabled, I see the same behavior as @Jakobu5 did with the modified 19.07 kernel. My clients can associate and very small amounts of traffic work (DHCP, IPv6 autoconf, pings) but as soon as I put any meaningful traffic through it, the chipset crashes. I also see the same behavior when using WPA2 with PMF required, as expected.

Since @BrainSlayer had said that this was working in DD-WRT, I installed the latest (beta 05-16-2022-r48886) and gave it a try. No dice there either. I see exactly the same behavior as OpenWRT 22.03.

It looks to me like @johnnyxwan was right all along, this isn't going to work without changes to the firmware.

Having issues with wrt32x wifi, issue occurs on 21.02.2, 21.03, and a snapshot build of the latest purefusion available. Log attached.
Screenshot_20220705-155538_Chrome

any news?

any news?

I just ended up giving mine to my parents and setting it up for WPA2.

I gave mine to my brother and setup for wpa2. He is really happy. Good solid unit but no more support. It is time to move on to something else if you want more advanced options.

I gave mine to my brother and setup for wpa2. He is really happy. Good solid unit but no more support. It is time to move on to something else if you want more advanced options.

That is what is looking like... Shame on linksys for keeping some kind of firmware closed source(?). I hear that is the core reason why wpa3 is difficult to implement on this router.

WPA2 has been defeated long ago, and its "mitigation patch" 802.11w is not even supported by the router!

I gave mine to my brother and setup for wpa2. He is really happy. Good solid unit but no more support. It is time to move on to something else if you want more advanced options.

That is what is looking like... Shame on linksys for keeping some kind of firmware closed source(?). I hear that is the core reason why wpa3 is difficult to implement on this router.

WPA2 has been defeated long ago, and its "mitigation patch" 802.11w is not even supported by the router!

Isn't it more the fault of NXP who bought out Marvell's Wi-Fi chipset portfolio?

Is #416 not working for you? It's been in OpenWRT Snapshots since Nov 25th and according to the changelog it's going to be included in the upcoming OpenWRT 23.05.3 update too.
I've been running a WPA3-only wifi on my WRT1200AC for 3 months now and it is about as stable as WPA2 was before then, but it may depend on the specific wifi chips the clients use.

Edit: Although yeah, the WRT3200ACM specifically is not fixed unfortunately.

Hello,

To understand what has been done recently on this driver, this WPA3 has only been unlocked for WRT1900AC (8864 + 8897).
Normally, it was already functional for chipsets based on 8997.

So why doesn't it work on 8964?
2 hypotheses:

  • It could be firmware.
  • it could be driver level

Personally, I think it's due to the code.
=> The pcie_tx_skbs_ndp function may be much improved, but I have serious doubts about the way the driver handles packets. By unblocking the probresponse rewriting feature, you get kernel crashes.

As I don't have a WRT32 at hand, I haven't been able to study the problem.

Best regards,

I found a tarball from NXP website, it contains code and firmware for 88W8964, it might help to fix issues for WRT3200ACM

https://github.com/wongsyrone/nxp-W9064-PR25-25.2.2.0-P1077-D2082-WFO