Ejecting device while USBPcap is running causes BSOD
dontech opened this issue · 31 comments
EDIT: woups: I just got a BSOD when I physically unplugged my device while running N=3, S=2048K test with this PR libusb0.sys. Stop code is: KERNEL_MODE_HEAP_CORRUPTION. I never experienced BSOD with libusb0.sys when unplugging before.
@sonatique I can see that USBPcap appears in your stack trace. USBPcap is notorious for causing problems.
You probably installed it along with Wireshark I guess. Could you uninstall USBPcap and make sure it does not attach to your driver before testing ?
Then when the computer rebooted, I ran another test, and this time I was no longer able to get BSOD despite trying several times to unplug while running transfers.
So you probably found the cause of it, thanks a lot! But anyway, I am still wondering why I never had this before (while having USBPcap installed from the very beginning). It seems to mean that you added something that now "conflicts" somehow with USBPcap. I am concerned that some people that could potentially use a product containing this newer libusb0.sys may have USBPcap installed and therefore suffer instabilities they didn't have before...
- Could you lock down which version the crash occurred?
- Did you make sure you had the latest version of USBPcap installed? Older versions were buggy!
- Could you run the tests with a new version that I am making where PDB files are generated? Then just re-post the DMP file.
https://github.com/mcuee/libusb-win32/releases/tag/test_release_async_order3
This is the same as async_order2, but now with debug symbols. Please make it crash again, as I cannot reproduce it here, and send the DMP file.
Thanks,
/pedro
@sonatique writes:
Hi, thanks. My point is that the only ever version of libusb-win32 (libusb0.sys) I experienced BSOD with is your last experimental one. I used libusb-win32 for years (official release version up to the last one) with USBPCap installed (last version since 2020).
And I am pretty sure that I unplugged device violently more than once during those years.
What I mean is that: yes, USBPcap is somehow responsible for it (since removing it seems to make the issue vanish) but there is something in your last code change that made libusb0.sys "sensitive" to USBPcap, in a way it was not before. I am concerned by the fact that if this code goes into the wild, many people having USBPcap installed could suddenly start having issues they didn't have before... In summary: I would prefer to re-install UBSCap and test with it, such that you could maybe eliminate the code causing the issue...
I would prefer to re-install UBSCap and test with it, such that you could maybe eliminate the code causing the issue...
Yes please do and attach the DMP file here (use the async_order3) driver from above.
@dontech I was about to say I was not able to reproduce anymore, but suddenly it happened. Unfortunately the BSOD stopped at 0% and did not produce a dump file.
I have USBPCap 1.5.3.0 installed and (Wireshark 3.0.8). I did not see issue with USBPCap 1.5.4.0.
I'll keep trying until I get a dump file, thanks for your patience
@dontech here you go, wasn't that difficult actually ;-)
USBPCap 1.5.3.0
Wireshark 3.0.8
test_release_async_order3
3 concurrent transfers of 2MB each, 180MBytes/s net transfer rate
020824-9906-01.dmp
I have USBPCap 1.5.3.0 installed and (Wireshark 3.0.8). I did not see issue with USBPCap 1.5.4.0.
So it goes away in 1.5.4.0 ?
I have USBPCap 1.5.3.0 installed and (Wireshark 3.0.8). I did not see issue with USBPCap 1.5.4.0.
So it goes away in 1.5.4.0 ?
That is what I experienced, yes. But I cannot be 100% sure, it is possible it just happens less frequently...
I also had BSOD with older USBPCap. And 1.5.4.0 is fairly old. Are you sure this is not just a USBPCap bug?
1.5.4.0 is old but is the latest ;-)
Yes, I guess it is a USBPcap bug, that is maybe fixed in 1.5.4.0. I can not tell much, but my point is that even with the older (buggy?) USBPCap, I never experienced BSOD when using libusb0.sys made before your recent experimentations.
So the question was : is it one of the changes in libusb0.sys that triggered a bug in USBPCap, or a bug introduced in libusb0.sys that made it no longer compatible with USBPCap.
Given it seems that the issue is gone when using latest USBPcap, I think the bug was in previous USBPcap. For some reason it was not triggered until now.
I was hoping you could get a definitive answer by looking at my latest dump, but anyway I would say we can close the issue now.
I will do a lots of tests again when you'll have finalized #54 , so if I am seeing something again with latest USBPcap, I'll tell you.
Thanks!
@sonatique No, I also want to know the root of this problem as it could be a real bug, even though the finger points at USBPCap.
I made a mistake when adding the PDB info to the packages, so I have to spin another, as the libusb.dll PDB was overwriting the one for the driver.
Could you run it once more, and attach the crash dump again?
Thanks!
Ok cool, I like that ;-)
Of course I will run another crash, but that will be on Monday.
Here it is:
https://github.com/mcuee/libusb-win32/releases/tag/test_release_async_order4
Good weekend! :-)
@dontech I reproduced BSOD while unplugging using the "debug" version found here: https://github.com/mcuee/libusb-win32/releases/tag/test_release_async_order4 .
Here the minidump. Would you have preferred I would do it with the release version?
021224-10218-01.dmp
Here the minidump. Would you have preferred I would do it with the release version?
No, as long as i have the matching PDB files, all should be good.
Its actually a benefit that its the debug version, as they are built with more relaxed optimization rules, which makes it more likely the debugger can take it apart. Some PE files, if very optimized, can be hard to debug due to all the inline and other tricks the compiler applies.
Crashing stack is this:
fffffa86`ac55d470 fffff801`3a7b70b9 : 00000000`00000000 ffffdf84`00000001 ffffdf84`c000000e 01000000`00100000 : nt!ExFreeHeapPool+0xb7
fffffa86`ac55d550 fffff801`bc44a4b7 : ffffdf84`57fca1a0 ffffdf84`5c12ba20 ffffdf84`51995db0 00000000`00000009 : nt!ExFreePool+0x9
fffffa86`ac55d580 fffff801`bc44385d : ffffdf84`57fca1a0 ffffdf84`5c12ba20 fffffa86`00000001 00000000`00000009 : libusb0!transfer+0x387 [C:\githome\libusb-win32-async-order\libusb\src\driver\transfer.c @ 187]
fffffa86`ac55d600 fffff801`bc441698 : ffffdf84`57fca1a0 ffffdf84`5c12ba20 00000000`00000001 00000000`00000000 : libusb0!dispatch_ioctl+0x54d [C:\githome\libusb-win32-async-order\libusb\src\driver\ioctl.c @ 164]
fffffa86`ac55d7a0 fffff801`3a035cf5 : ffffdf84`57fca050 ffffdf84`5c12ba20 00000000`00000021 ffffdf84`5496f080 : libusb0!dispatch+0xf8 [C:\githome\libusb-win32-async-order\libusb\src\driver\dispatch.c @ 52]
fffffa86`ac55d7f0 fffff801`3a4452ac : 00000000`00000002 00000000`00000000 00000000`0022202e ffffdf84`4d8775d0 : nt!IofCallDriver+0x55
fffffa86`ac55d830 fffff801`3a444f03 : 00000000`0022202e fffffa86`ac55db80 00000000`00040000 00000000`0022202e : nt!IopSynchronousServiceTail+0x34c
fffffa86`ac55d8d0 fffff801`3a4441d6 : 00000000`00000001 00000000`00000000 00000000`00000000 0000023c`731175b0 : nt!IopXxxControlFile+0xd13
fffffa86`ac55da20 fffff801`3a211238 : 00000000`00000000 00000000`00000001 00000000`00000102 ffffdf84`6006a390 : nt!NtDeviceIoControlFile+0x56
fffffa86`ac55da90 00007ffc`9fd6d0c4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x28
000000d6`aacfe418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffc`9fd6d0c4
OK I see the problem.
-
I changed the way we handle the error code for IoCallDriver(), which probably spits out something nasty during ejection.
-
This is clearly wrong, as I do not have a map of which error codes are valid, and which error codes are emitted by the underlying drivers. It was an attempt at preventing leaks, but I should clear wait for the IRP to complete.
-
We end up de-allocating stuff that probably was deallocated or protected at this point by the lower level drivers. You cannot mess around with IRPs after calling IoCallDriver().
-
USBPcap propably changes the timing a bit so this happened more often.
Thanks for the details.
So it is a good thing we hunted this one down to the end!
USBPcap changes timings indeed, and this explains also why it didn't crash every time.
@dontech : yeah, I think you nailed it! I tested with the "debug" archive from here: https://github.com/mcuee/libusb-win32/releases/tag/test_release_usbpcap_bsod and I have NOT been able to reproduce a BSOD no matter I tried. Before, I had one manipulation that was causing BSOD 100% of the time, it is no longer the case, very good!
Just one point: I think this code is not based on the latest version of #54 , because I see transfers completed with max 256KB while I saw full (2048K or more) completion when I last tested #54. Am I correct?
Just one point: I think this code is not based on the latest version of #54 , because I see transfers completed with max 256KB while I saw full (2048K or more) completion when I last tested #54. Am I correct?
No, they are both from master, more or less.
However, I think the regression was introduced with this:
Which causes it to revert to full-speed = 256K per transfer.
Hum, OK, thanks, but I am pretty sure that when I did my experiments (with both the last version from #54 and the last from here) I was at SuperSpeed (5GB), and though I saw 2048K vs 256K.
I think there might be a regression but non only for full-speed...
Yep found the problem. @tormodvolden accidentally filtered win8 targets, when it should have filtered win8 WDK targets (and i missed it in my review)
Fixed with:
This bug can be considered closed now.
Oops, sorry about that. I should have asked explicitly about those NTDDI defines, because I was outright cargo-culting from some MinGW headers without being sure about their precise meaning or availability in "the other" build environment.
Adding that well-explaining "contract" define is much better, thanks!
No worries. I really appreciate your contributions.
@sonatique Please test this new release:
https://github.com/mcuee/libusb-win32/releases/tag/snapshot_1.3.0.2
This should incorporate all the fixes in one.
@dontech : yes In
https://github.com/mcuee/libusb-win32/releases/tag/snapshot_1.3.0.2
This should incorporate all the fixes in one.
Yes indeed! My tests shows that when submitting multiple transfers with large buffers (i.e. > 64K, typically 2MB):
- no more BSOD when unplugging device
- transfers complete at full buffer size (up to 32MB, as you stated)
- no missing data or "corruptions"
That is quite some improvements, thanks a lot!