mt7921u active monitor mode breaks driver
ZerBea opened this issue · 87 comments
I got an ALFA AWUS036AXML. Setting active monitor mode causes the driver to stop.
It took me several days to figure out what went wrong. A lot of tests have let this thread grow.
This is the conclusion (the entire history is below).
Steps to reproduce by common tools like iw, ip link and tshark.
monitor mode:
$ sudo ip link set wlp22s0f0u4i3 down
$ sudo iw dev wlp22s0f0u4i3 set type monitor
$ sudo ip link set wlp22s0f0u4i3 up
$ tshark -i wlp22s0f0u4i3
22 packets captured
active monitor mode:
$ sudo ip link set wlp22s0f0u4i3 down
$ sudo iw dev wlp22s0f0u4i3 set monitor active
$ sudo ip link set wlp22s0f0u4i3 up
$ tshark -i wlp22s0f0u4i3
Capturing on 'wlp22s0f0u4i3'
^C
0 packets captured
Background:
Running active monitor mode, the device ACK incoming frames addressed to the virtual MAC of the device.
This feature is really useful to perform PMKID attacks.
At the moment, active monitor mode is working on:
mt76x0u
mt76x2u
It is not working on:
mt7601u
mt7921u
I see three options:
hcxdumptool does not set active monitor mode by default even if the driver reports that it is supported.
That has been done by this commit:
ZerBea/hcxdumptool@8d3f24e
active monitor mode capability should not be reported by the driver
[code]
mt7601u:
$ iw list | grep active
Device supports active monitor (which will ACK incoming frames)
mt7921u:
$ iw list | grep active
Device supports active monitor (which will ACK incoming frames)
[/code]
active monitor mode should be fixed by driver
Looks like this driver (https://github.com/openwrt/mt76) doesn't compile (out of the box) running Linux 6.6.1:
$ make -C /lib/modules/`uname -r`/build M=$PWD
make: Entering directory '/usr/lib/modules/6.6.1-arch1-1/build'
CC [M] /tmp/mt76/mmio.o
In file included from /tmp/mt76/mt76.h:19,
from /tmp/mt76/mmio.c:6:
/tmp/mt76/testmode.h:196:32: error: array type has incomplete element type 'struct nla_policy'
196 | extern const struct nla_policy mt76_tm_policy[NUM_MT76_TM_ATTRS];
| ^~~~~~~~~~~~~~
/tmp/mt76/mt76.h: In function 'mt76_put_page_pool_buf':
/tmp/mt76/mt76.h:1647:9: error: implicit declaration of function 'page_pool_put_full_page' [-Werror=implicit-function-declaration]
1647 | page_pool_put_full_page(page->pp, page, allow_direct);
| ^~~~~~~~~~~~~~~~~~~~~~~
/tmp/mt76/mt76.h: In function 'mt76_get_page_pool_buf':
/tmp/mt76/mt76.h:1655:16: error: implicit declaration of function 'page_pool_dev_alloc_frag' [-Werror=implicit-function-declaration]
1655 | page = page_pool_dev_alloc_frag(q->page_pool, offset, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~
/tmp/mt76/mt76.h:1655:14: error: assignment to 'struct page *' from 'int' makes pointer from integer without a cast [-Werror=int-conversion]
1655 | page = page_pool_dev_alloc_frag(q->page_pool, offset, size);
| ^
cc1: all warnings being treated as errors
make[2]: *** [scripts/Makefile.build:243: /tmp/mt76/mmio.o] Error 1
make[1]: *** [/usr/lib/modules/6.6.1-arch1-1/build/Makefile:1913: /tmp/mt76] Error 2
make: *** [Makefile:234: __sub-make] Error 2
make: Leaving directory '/usr/lib/modules/6.6.1-arch1-1/build'
BTW:
I went back
to kernel 6.5.1 (Debian kernel config) -> neither monitor mode nor packet injection is working
to kernel 6.1.21 (Raspbian kernel config) -> neither monitor mode nor packet injection is working
update
After I got this issue report:
ZerBea/hcxdumptool#376
I did some more tests.
If the interface is on monitor mode:
$ sudo hcxdumptool -m wlp22s0f0u9u3i3
$ iw dev
phy#12
Interface wlp22s0f0u9u3i3
ifindex 15
wdev 0xc00000001
addr 00:c0:ca:b5:74:e6
type monitor
channel 1 (2412 MHz), width: 20 MHz (no HT), center1: 2412 MHz
txpower 3.00 dBm
multicast TXQ:
qsz-byt qsz-pkt flows drops marks overlmt hashcol tx-bytes tx-packets
0 0 0 0 0 0 0 0
it will receive packets:
$ tshark -i wlp22s0f0u9u3i3
Capturing on 'wlp22s0f0u9u3i3'
263 packets captured
But once the first frame has been injected, every thing stops:
$ tshark -i wlp22s0f0u9u3i3
Capturing on 'wlp22s0f0u9u3i3'
^C
0 packets captured
Looks like frame injection killed the driver.
You might want to retest with the recently released firmware:
Section 2.
Thanks for that information. I'll give it a try, but I still think it is related to the driver.
This is the latest working firmware:
Build Time: 20230526130958
This one does not load:
Build Time: 20231109190918
This one does not load: Build Time: 20231109190918
It loads here:
$ ethtool -i wlx00c0cab37abb
driver: mt7921u
version: 6.5.0-0.deb12.1-amd64
firmware-version: ____010000-20231109190959
Adapter: Alfa AXML
Distro: Debian 12
Remember that wifi firmware for the 7921 requires two firmware files:
WIFI_MT7961_patch_mcu_1_2_hdr.bin
WIFI_RAM_CODE_MT7961_1.bin
There is also a bluetooth file but you won't be using it to so you can delete the file from the system:
BT_RAM_CODE_MT7961_1_2_hdr.bin
I double checked this:
old firmware:
[16148.856186] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20230526131214
[16148.879434] mt7921u 1-9.3:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
new firmware:
[41144.190166] usb 1-9.3: new high-speed USB device number 11 using xhci_hcd
[41144.321418] usb 1-9.3: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00
[41144.321422] usb 1-9.3: New USB device strings: Mfr=6, Product=7, SerialNumber=8
[41144.321424] usb 1-9.3: Product: Wireless_Device
[41144.321426] usb 1-9.3: Manufacturer: MediaTek Inc.
[41144.321427] usb 1-9.3: SerialNumber: 000000000
[41144.428601] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20231109191416
only the BT firmware has been loaded.
$ iw dev
$
All three bin's have been replaced:
$ ls *MT7961*.*
BT_RAM_CODE_MT7961_1_2_hdr.bin.zst
WIFI_RAM_CODE_MT7961_1.bin.zst
WIFI_MT7961_patch_mcu_1_2_hdr.bin.zst
I give it another try without compressing the files.
[41864.599670] usb 1-9.3: USB disconnect, device number 16
[41868.761868] usb 1-9.3: new high-speed USB device number 17 using xhci_hcd
[41868.893554] usb 1-9.3: New USB device found, idVendor=0e8d, idProduct=7961, bcdDevice= 1.00
[41868.893561] usb 1-9.3: New USB device strings: Mfr=6, Product=7, SerialNumber=8
[41868.893563] usb 1-9.3: Product: Wireless_Device
[41868.893565] usb 1-9.3: Manufacturer: MediaTek Inc.
[41868.893567] usb 1-9.3: SerialNumber: 000000000
[41869.006650] Bluetooth: hci0: HW/SW Version: 0x008a008a, Build Time: 20231109191416
same result.
BTW:
regardless if the new firmware has been compressed by zstd and regardless to which port the device is connected (USB2 or USB3) after a while a got this error:
connected to USB2 port
[41930.422262] usb 1-9.3: device not accepting address 17, error -71
connected to USB3 port
[41961.868976] usb 1-4: device descriptor read/64, error -110
I'm on kernel
$ uname -r
6.6.3-arch1-1
$ uname -r
6.5.0-0.deb12.1-amd64
I haven't gone to kernel 6.6 yet on this system, which is my main dev box, but will investigate doing a test with 6.6 on another box.
Keep in mind that loading the bluetooth firmware is basically worthless as you can't run USB3 and bluetooth together on a USB WiFi adapter. I'm pretty sure the communications between Mediatek and the makers was poor in this respect as the bt firmware should not be loaded if the adapter is in USB3 mode.
At the moment, I'm running out of ideas.
I got nothing. I've never seen firmware compressed but if you say it works, it probably works. Got any other distros to try?
zstd compression is not something new:
https://www.phoronix.com/news/2021-Linux-Zstd-Firmware
and it is working like a charm.
BTW:
At every test my reference is an ALFA AWUS036ACM (mt76x2u) and an ALFA AWUS036ACHM (mt76x0u).
If both devices/drivers are working as expected, everything seems to be fine.
After that, I'll go hunting for the problems of the new device/driver.
I wonder if you had a bad download of one of the wifi firmware files?
Good point - let's compare it:
$ md5sum BT_RAM_CODE_MT7961_1_2_hdr.bin
f8e386541ca02a6311d7c0d9441fbab7 BT_RAM_CODE_MT7961_1_2_hdr.bin
$ md5sum WIFI_MT7961_patch_mcu_1_2_hdr.bin
0a4d833efe94a56c502de8a38405d8fe WIFI_MT7961_patch_mcu_1_2_hdr.bin
$ md5sum WIFI_RAM_CODE_MT7961_1.bin
8d0a4f6dc2d01a8b442ae0b8d76d9122 WIFI_RAM_CODE_MT7961_1.bin
Here are my results from /lib/firmware/mediatek
$ md5sum WIFI_MT7961_patch_mcu_1_2_hdr.bin
0a4d833efe94a56c502de8a38405d8fe WIFI_MT7961_patch_mcu_1_2_hdr.bin
$ md5sum WIFI_RAM_CODE_MT7961_1.bin
8d0a4f6dc2d01a8b442ae0b8d76d9122 WIFI_RAM_CODE_MT7961_1.bin
I can't check the BT firmware because it does not exist on my system. I delete it to prevent it from loading and using resources. It should not load given that BT is turned off in our adapters (Alfa AXML) but it does... that is a programming mistake that needs to be corrected.
Thanks. The md5 hashes matches.
I'll compile kernel 6.5 and give it another try.
Unfortunately the system on which I compiled the kernel does not have USB3 hardware.
Now compiling the kernel on an USB3 system. When finished, we have all combinations of kernels, ehci, xhci and firmware.
Conclusion:
the new firmware loads fine on kernel 6.5
the ERROR is back (after a while):
[ 4213.904348] usb 1-2: device not accepting address 21, error -71
and packet injection is still not working.
As a final test I compiled kernel 6.1 and got the same results.
Now I give up and wait for a driver update.
To make sure it is not a malfunction of my device. Is packet injection working on your system (kernel 6.5 and latest firmware)?
$ sudo hcxdumptool -i INTERFACENAME --rds=1 -F
I guess that my device is fine, because the problem occurs on openwrt as well.
ZerBea/hcxdumptool#376
Interesting. It looks to me that you have the makings of additional bugs reports. It is also possible there is one source that is the cause. Hard to say.
The USB subsystem drivers, and especially the USB3 drivers, are not mankind's great invention.
I'm going to try to setup to test with kernel 6.6 and 6.7 tomorrow if I feel better.
I have two test systems in my lab but only one is setup and it is using secure boot which is not going to work with this very well at all so I need to rethink my setup. Will report.
Great, thanks.
My test systems:
2 x Intel (ehci)
2 x AMD (xhci)
5 x Raspberry Pi zero
2 x Raspberry Pi A
2 x Raspberry Pi B
Linux kernel 6.1, 6.5 and 6.6
All tested devices / drivers (the latest tested device only with a driver patch) passed the tests on all systems:
ZerBea/hcxdumptool#361
Except the mt7921u, which suggests to me that my testing environment is ok,
Unfortunately the mt7921u test is time expensive, because, in every case I have two remaining screws (driver and firmware).
Right know, I don't know which of them caused the trouble.
I found the problem.
Unfortunately it is similar to this one:
#778
Driver reports that active monitor mode is possible:
$ iw list | grep active
Device supports active monitor (which will ACK incoming frames)
But if hcxdumptool set active monitor mode, it stops working.
If active monitor mode is disabled, everything's fine
0 ERROR(s) during runtime
638 Packet(s) captured by kernel
0 Packet(s) dropped by kernel
1 SHB written to pcapng dumpfile
1 IDB written to pcapng dumpfile
1 ECB written to pcapng dumpfile
83 EPB written to pcapng dumpfile
exit on sigterm
I don't think the problem is related to hcxdumptool, because it can be reproduced with iw, ip link and tshark, too:
$ sudo ip link set wlp22s0f0u4i3 down
$ sudo iw dev wlp22s0f0u4i3 set type monitor
$ sudo ip link set wlp22s0f0u4i3 up
$ tsahrk -i wlp22s0f0u4i3
22 packets captured
$ sudo ip link set wlp22s0f0u4i3 down
$ sudo iw dev wlp22s0f0u4i3 set monitor active
$ sudo ip link set wlp22s0f0u4i3 up
$ tshark -i wlp22s0f0u4i3
Capturing on 'wlp22s0f0u4i3'
^C
0 packets captured
Have you modified the original message to reflect this finding?
How does this finding reflect overall? Is packet injection working with active monitor mode off?
I'm a little fuzzy after being sick for so many days. Why is active monitor mode needed?
The head line has been modified.
Packet injection is working like a charm:
ZerBea/hcxdumptool#361 (comment)
Background:
Running active monitor mode, the device ACK incoming frames addressed to the virtual MAC of the device.
This feature is really useful to perform PMKID attacks.
At the moment, active monitor mode is working on:
mt76x0u
mt76x2u
It is not working on:
mt7601u
mt7921u
I see three options:
hcxdumptool does not set active monitor mode by default even if the driver reports that it is supported.
That has been done by this commit:
ZerBea/hcxdumptool@8d3f24e
active monitor mode capability should not be reported by the driver
[code]
mt7601u:
$ iw list | grep active
Device supports active monitor (which will ACK incoming frames)
mt7921u:
$ iw list | grep active
Device supports active monitor (which will ACK incoming frames)
[/code]
active monitor mode should be fixed by driver
The head line has been modified.
It might help since this post has followed a long path to get where it is, if you use "Edit:" at the top of the original post then add what you have added in your last 2 posts so as to make it easy for a person that might fix it to understand without having to track things down. It might also, as am alternative work if you close this report and start a clean new post. I'm going to try to consolidate the information and add it to my main mt7921u bug list at my site. It seems quite clear at this point that active monitor is broken and is the cause of the problem at this point.
Done. Important information and steps how to reproduce is now mentioned in the first comment.
Thanks for pointing me into this direction.
Looks good. I borrowed some of your work. I'm reworking the BUG thread over at my site. This bug is now posted as the top:
Hopefully this can be fixed.
I hope so, too.
The performance of the interface is enormous (when running latest git head hcxdumptool).
It is now number one:
ZerBea/hcxdumptool#361
Let me explain the difference between active monitor and passive monitor mode:
If hcxdumptool request e.g. an ASSOCIATION by transmitting an ASSOCIATIONREQUEST frame
active monitor mode:
the target AP responds with an ASSOCIATIONRESPONSE frame and the device that hcxdumptool use ACK it
passive monitor mode:
the target AP responds with an ASSOCIATIONRESPONSE frame but due to missing ACK, it transmits up to 7 retries (which will spam the entire channel). That is a huge performance impact, because the channel is busy 7 times longer.
We can close this report. As a workaround, hcxdumptool's active monitor mode is not running by default.
Even if the driver reports that it is possible, the user must allow it by a command line option.
I haven't seen any action on your report over at OpenWRT. This is something that needs to be fixed. I am neck deep in work right now but when I have a little time, I think I know a guy that can track this down and suggest a patch. Remind me in a couple of weeks if we see no action.
Nick
FYI: I saw new firmware flow into linux-wireless last week so it should be posted for download this week or next. We have been making enough noise that the devs may have taken a look and found a fix.
As usual, the firmware guide is menu item 8, look at section 3 for this chipset:
@morrownr
Problem is not the firmware, because it is the same on all kernels.
On 6.6 it is loaded, on 6.7 not.
Active monitor mode is a nice add on. Latest hcxdumptool received a workaround if the driver reports active monitor mode capabilities but fails on it.
Are you sure you don't want to keep this open?
If my remaining machine received an update to kernel 6.7 I'm no longer able to do some further going tests regarding active monitor mode. Unfortunately.
But I'll reopen this on next kernel 6.8 (or 6.9 or if the the problem mentioned above has been fixed).
hcxdumptool doesn't need active monitor mode any longer.
At least I found a firmware version that is working (for me) on all kernel versions:
Bluetooth: hci1: HW/SW Version: 0x008a008a, Build Time: 20230526131214
mt7921u 1-2:1.3: HW/SW Version: 0x8a108a10, Build Time: 20230526130917a
mt7921u 1-2:1.3: WM Firmware Version: ____010000, Build Time: 20230526130958
Now that I can test active monitor mode again (active monitor mode is still not working).
I reopen this issue report as mentioned above.
It's not easy to find the real problem due to several setscrews (driver/firmware combinations).
Wow! I have been seeing a pickup in people reporting problems related to this. It is reasonable to assume if the driver is reporting active monitor mode support, then it will work. They try to use it and end up with a mess.