MT7981 5GHz occasionally cannot disconnect clients that have left and causes bad performance.
victor186 opened this issue ยท 29 comments
Is the second device also connected to the network?
I see really bad signal from it, communication with such device can highly decrease performance.
Is the second device also connected to the network? I see really bad signal from it, communication with such device can highly decrease performance.
This devices on list is in 2.4GHz
Can you list your wifi clients(device models)?
Can you list your wifi clients(device models)?
I can't, due this device is running as AP on a restaurant for administrative and client's Wi-Fi
Looks like Qualcomm QCA9377 + windows 10 driver + 5GHz can cause this. No problems on 2.4 band.
Do you have driver 10.0.0.1272 for Windows installed?
Looks like Qualcomm QCA9377 + windows 10 driver + 5GHz can cause this. No problems on 2.4 band.
I not understood, Wi-Fi 5GHz adapter with QCA9377 is causing 5GHz network bad performance? I don't have QCA9377 on network and the router is mediatek.
I don't have QCA9377 on network
How can you be sure?
device is running as AP on a restaurant for administrative and client's Wi-Fi
I don't have QCA9377 on network
How can you be sure?
device is running as AP on a restaurant for administrative and client's Wi-Fi
The clients only use smartphones.
The unique PC on Wi-Fi is using a realtek wi-fi adapter
I don't have QCA9377 on network
How can you be sure?
device is running as AP on a restaurant for administrative and client's Wi-Fi
The clients only use smartphones. The unique PC on Wi-Fi is using a realtek wi-fi adapter
If QCA9377 can affect 5GHz AP on mt76+mt7915(mt7981), then maybe some other clients can do the same.
I'm not an owner of QCA9377. I just helped a user to isolate the problem on openwrt 23.05.5 mt7981 device.
@nbd168 what do you think about this?
One thing you could try is copy the latest MT7981 firmware from https://github.com/openwrt/mt76/tree/master/firmware to your device. If that doesn't help, trying a recent snapshot might also be a good idea.
One thing you could try is copy the latest MT7981 firmware from https://github.com/openwrt/mt76/tree/master/firmware to your device.
Already done this, it didn't help.
If that doesn't help, trying a recent snapshot might also be a good idea.
That user didn't want to experiment with snapshot. Connecting QCA9377 to 2.4GHz AP solved issue with 5GHz AP for him.
I'd say there are too little details we could help you
Openwrt 23.05.5. H3C Magic NX30 Pro.
Same issue here. Encountered it several times
Almost zero speed (1kb/s) through 5G wifi. Enough for DHCP but anything else will be broken, even ping.
I noticed that when this happening, there are 2 dead clients (which maybe leave the wifi range at the same time) in luci wifi page. With RX Rate / TX Rate 6.0 Mbit/s, 20 MHz. If I manually click the "Disconnect" button, the wifi works again immediately.
More info
Also, when I check the log. The log keeps showing that the two offline clients were still AP-STA-POLL-OK. Started when they were out of the wifi range, till I clicked the luci "Disconnect" button.
P.S. OFFLINE:MAC:1 OFFLINE:MAC:2 are clients that went away.
Wed Nov 20 19:33:33 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:1**
Wed Nov 20 19:35:31 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:2**
Wed Nov 20 19:38:44 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:1**
Wed Nov 20 19:40:51 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:2**
Wed Nov 20 19:44:03 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **OFFLINE:MAC:1**
...
Wed Nov 20 20:06:42 2024 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED **OFFLINE:MAC:1**
Wed Nov 20 20:06:44 2024 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED **OFFLINE:MAC:2**
Wed Nov 20 20:06:47 2024 daemon.info hostapd: phy1-ap0: STA **OFFLINE:MAC:1** IEEE 802.11: deauthenticated due to local deauth request
Wed Nov 20 20:06:49 2024 daemon.info hostapd: phy1-ap0: STA **OFFLINE:MAC:2** IEEE 802.11: deauthenticated due to local deauth request
When I restart the 5g wifi a few minutes later. Another sus log.
Wed Nov 20 20:13:06 2024 kern.warn kernel: [2135649.716364] Ignoring NSS change in VHT Operating Mode Notification from **OFFLINE:MAC:1** with invalid nss 2
Wed Nov 20 20:13:06 2024 kern.info kernel: [2143605.339316] device phy1-ap0 left promiscuous mode
Wed Nov 20 20:13:06 2024 kern.info kernel: [2143605.354371] br-lan: port 5(phy1-ap0) entered disabled state
Wed Nov 20 20:13:07 2024 daemon.notice wpa_supplicant[1538]: Set new config for phy phy1
Wed Nov 20 20:13:07 2024 daemon.notice hostapd: Set new config for phy phy1: /var/run/hostapd-phy1.conf
Wed Nov 20 20:13:07 2024 daemon.notice hostapd: Reload config for bss 'phy1-ap0' on phy 'phy1'
Wed Nov 20 20:13:07 2024 daemon.notice hostapd: phy1-ap0: AP-STA-DISCONNECTED **AN:ONLINE:CLIENT:MAC:1**
Wed Nov 20 20:13:08 2024 daemon.notice hostapd: Reloaded settings for phy phy1
Wed Nov 20 20:13:08 2024 daemon.notice netifd: Wireless device 'radio1' is now up
Wed Nov 20 20:13:08 2024 daemon.notice netifd: Network device 'phy1-ap0' link is up
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.148600] br-lan: port 5(phy1-ap0) entered blocking state
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.154384] br-lan: port 5(phy1-ap0) entered disabled state
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.160337] device phy1-ap0 entered promiscuous mode
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.165646] br-lan: port 5(phy1-ap0) entered blocking state
Wed Nov 20 20:13:08 2024 kern.info kernel: [2143607.171424] br-lan: port 5(phy1-ap0) entered forwarding state
Wed Nov 20 20:13:09 2024 daemon.info dnsmasq[1]: read /etc/hosts - 12 names
Wed Nov 20 20:13:09 2024 daemon.info dnsmasq[1]: read /tmp/hosts/dhcp.cfg01411c - 4 names
Wed Nov 20 20:13:09 2024 daemon.info dnsmasq-dhcp[1]: read /etc/ethers - 0 addresses
...
Wireless config
cat /etc/config/wireless
config wifi-device 'radio0'
option type 'mac80211'
option path 'platform/18000000.wifi'
option channel '1'
option band '2g'
option htmode 'HT20'
option country 'CN'
option cell_density '0'
config wifi-iface 'default_radio0'
option device 'radio0'
option network 'lan'
option mode 'ap'
option ssid 'ssid1'
option encryption 'psk2+ccmp'
option key 'WIFIPASSWD'
config wifi-device 'radio1'
option type 'mac80211'
option path 'platform/18000000.wifi+1'
option channel '149'
option band '5g'
option htmode 'HE80'
option country 'CN'
option cell_density '0'
option txpower '27'
config wifi-iface 'default_radio1'
option device 'radio1'
option network 'lan'
option mode 'ap'
option ssid 'ssid2'
option encryption 'sae-mixed'
option key 'WIFIPASSWD'
May related:
openwrt/openwrt#14415
I reproduced this bug.
If a client leaves the WiFi coverage, there is a certain probability (10%? i guess) that the above bug will occur.
It is almost the same as this issue openwrt/openwrt#14415 . But it also causes bad wifi performance. (In my case this is extremely bad, < 1kb/s, other clients can still connect but only enough for DHCP to complete and anything else will be broken, even ping.)
Log keeps showing AP-STA-POLL-OK after the client left. (p.s. I added option max_inactivity '60'
. )
...
Thu Nov 21 09:25:38 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:26:46 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:27:56 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:29:04 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:30:24 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:31:33 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:32:39 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:33:44 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
Thu Nov 21 09:34:51 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC**
...
iw shows the client still "associated".
iw dev phy1-ap0 station dump
Station **WENT:AWAY:CLIENT:MAC** (on phy1-ap0)
inactive time: 46190 ms
rx bytes: 7315589
rx packets: 52352
tx bytes: 66444699
tx packets: 69473
tx retries: 6987
tx failed: 7033
rx drop misc: 2
signal: -95 [-97, -99] dBm
signal avg: -91 [-93, -95] dBm
tx bitrate: 6.0 MBit/s
tx duration: 83677141 us
rx bitrate: 6.0 MBit/s
rx duration: 4720659 us
last ack signal:-96 dBm
avg ack signal: -95 dBm
airtime weight: 256
authorized: yes
authenticated: yes
associated: yes
preamble: short
WMM/WME: yes
MFP: no
TDLS peer: no
DTIM period: 2
beacon interval:100
short preamble: yes
short slot time:yes
connected time: 8708 seconds
associated at [boottime]: 2183028.795s
associated at: 1732143676976 ms
current time: 1732152384528 ms
p.s. Above device is a smartphone with snapdragon FastConnect 6800 (However, I do believe other clients can do the same.). It left the wifi range hour ago and kilometers away from wifi.
If I manually click the "Disconnect" button in luci, the wifi works again immediately, (no restart).
I'm using the offical unmodified Openwrt 23.05.5 image. openwrt/openwrt#14415 seems using a fork openwrt with a modified driver(?) (I misunderstund, they enabled /sys/module/mt7915e/parameters/wed_enable.).
I did not set the wed_enable.
cat /sys/module/mt7915e/parameters/wed_enable
N
I reproduced this bug. If a client leaves the WiFi coverage, there is a certain probability that the above bug will occur.
It is almost the same as this issue openwrt/openwrt#14415 .
Log keeps showing (p.s. I added
option max_inactivity '60'
.)Thu Nov 21 09:25:38 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** Thu Nov 21 09:26:46 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** Thu Nov 21 09:27:56 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** Thu Nov 21 09:29:04 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** Thu Nov 21 09:30:24 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** Thu Nov 21 09:31:33 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** Thu Nov 21 09:32:39 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** Thu Nov 21 09:33:44 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** Thu Nov 21 09:34:51 2024 daemon.notice hostapd: phy1-ap0: AP-STA-POLL-OK **WENT:AWAY:CLINET:MAC** ...
iw dev phy1-ap0 station dump Station **WENT:AWAY:CLIENT:MAC** (on phy1-ap0) inactive time: 46190 ms rx bytes: 7315589 rx packets: 52352 tx bytes: 66444699 tx packets: 69473 tx retries: 6987 tx failed: 7033 rx drop misc: 2 signal: -95 [-97, -99] dBm signal avg: -91 [-93, -95] dBm tx bitrate: 6.0 MBit/s tx duration: 83677141 us rx bitrate: 6.0 MBit/s rx duration: 4720659 us last ack signal:-96 dBm avg ack signal: -95 dBm airtime weight: 256 authorized: yes authenticated: yes associated: yes preamble: short WMM/WME: yes MFP: no TDLS peer: no DTIM period: 2 beacon interval:100 short preamble: yes short slot time:yes connected time: 8708 seconds associated at [boottime]: 2183028.795s associated at: 1732143676976 ms current time: 1732152384528 ms
p.s. Above device is a smartphone with snapdragon FastConnect 6800 (However, I do believe other clients can do the same.). It left the wifi range hour ago and kilometers away from wifi.
If I manually click the "Disconnect" button in luci, the wifi works again immediately, (no restart).
I'm using the offical unmodified Openwrt 23.05.5 image. openwrt/openwrt#14415 seems using a fork openwrt
with a modified driver(?)(I misunderstund, they enabled /sys/module/mt7915e/parameters/wed_enable.).I did not set the wed_enable.
cat /sys/module/mt7915e/parameters/wed_enable N
It's make sense, because the router as public Wi-Fi have client's entering and quiting the network at all time.
And i noticed via luci some client's with signal -9x dBm that never disconnect's, like your example, client out of range never disapears.
Sorry. My router is a main device, It is hard for me to play with it. But I can provide log if needed.
@victor186 I feel this is a common bug, for all MT7981, but it happens occasionally, hard to reproduce and notice.
Maybe we could change the title to make it easier for more users to find?
"MT7981 5GHz occasionally cannot disconnect clients that have left and causes bad performance."
Sorry. My router is a main device, It is hard for me to play with it. But I can provide log if needed.
@victor186 I feel this is a common bug, for all MT7981, but it happens occasionally, hard to reproduce and notice.
Maybe we could change the title to make it easier for more users to find?
"MT7981 5GHz occasionally cannot disconnect clients that have left and causes bad performance."
Done
A dirty temp fix. Tested, works for me. Do not know if there is any side effect.
Run this script every minute via cron.
It will "disconnect" all clients that have a very very low signal strength (should be the clients that have already left the wifi coverage but still buggy as "associated".).
#!/bin/sh
# threshold (dBm)
thr=-90
# add other interface name if any, "phy1-ap0 phy1-ap1 phy1-ap2"
wlanlist="phy1-ap0"
disconnect() {
mac=$1
wlan=$2
rssi=$3
echo "disconnecting client at $wlan $mac with $rssi dBm (thr=$thr)" | logger -t disconnected-client-killer
ubus call hostapd.$wlan del_client "{'addr':'$mac', 'reason':5, 'deauth':true, 'ban_time':1000}"
# "ban_time" prohibits the client to reassociate for the given amount of milliseconds.
}
for wlan in $wlanlist; do
iwinfo ${wlan} assoclist | grep SNR | while read line; do
mac=$(echo "${line}" | awk '{ print $1 }')
rssi=$(echo "${line}" | awk '{ print $2 }')
if [ $rssi -lt $thr ]; then
disconnect $mac $wlan $rssi
fi
done
done
Maybe adding patch similar to https://github.com/freifunk-gluon/gluon/blob/main/patches/openwrt/0009-mt76-include-fixes-for-MT7603-MT7612.patch would help?
You can try this patch from mtk
This patch def does some good thing, before i had intermittent packet loss indication every min or less in games, now thats completely fixed with this patch.
You can try this patch from mtk
This patch def does some good thing, before i had intermittent packet loss indication every min or less in games, now thats completely fixed with this patch.
I tried this patch, and speed dropped 2x times with inactive WED.
You can try this patch from mtk
This patch def does some good thing, before i had intermittent packet loss indication every min or less in games, now thats completely fixed with this patch.
I tried this patch, and speed dropped 2x times with inactive WED.
I dont notice a speed difference with WED enabled.
Below client has left the house, but the MT6000 still sees/tracks it with a -92/-92 RSSI, ugh
Using a pretty recent OpenWrt SNAPSHOT, r28242, with:
mt798x-wmac 18000000.wifi: WM Firmware Version: ____000000, Build Time: 20240823160721
mt798x-wmac 18000000.wifi: WA Firmware Version: DEV_000000, Build Time: 20240823160840
Stressing roamings with DAWN and or disconnects by walking of bounds seem to trigger that odd condition.
I might try the cron job workarounnd. Since this is affecting my mesh network as batctl ends with nodes with 0.3 crawling link-speeds.
Observed similar AP-STA-POLL-OK logs with my Flint 2 on 2.4G WiFi.
A dirty temp fix. Tested, works for me. Do not know if there is any side effect.
Run this script every minute via cron.
It will "disconnect" all clients that have a very very low signal strength (should be the clients that have already left the wifi coverage but still buggy as "associated".).
I have adapted your solution and started using it to workaround this for my case too.
gist:openwrt-mt76-disconnect-workaround
This version can be added under init / rc scripts since it spawns a subshell on boot that keeps checking for the condition every N seconds.
Another slight change is there is no need to set a threshold, it instead considers that if the signal is lower than the noise floor.
We understand this is just a temporary workaround while we wait for the real solution, and also wonder if that MTK ref from losing the ACK on AX chips is related.