HPE team - found a stack override problem on libusb-win32
JamesHuangHPE opened this issue · 29 comments
Hi Peter,
This is James from HPE, we recently found a stack override problem on the libusb-win32 driver.
https://sourceforge.net/projects/libusb-win32/
Please review below for details and let us know your opinion and suggestions.
It would be great if you can help us to fix this issue.
I have sent you an email around 8/5.
The problem is the irp->userIosb ptr member which points to local io_status which is allocated locally from the stack.
The kernel updates the 2 qword io_status which is originally on the stack of call_usbd_ex but sometimes the kernel stomps the stack of another routine if call_usbd_ex has already returned when the update happens.
Here’s a lot more detail from libusb_driver.c:
NTSTATUS call_usbd_ex(libusb_device_t *dev, void *urb, ULONG control_code, <-- see note1
int timeout, int max_timeout)
{
KEVENT event;
NTSTATUS status;
IRP *irp;
IO_STACK_LOCATION *next_irp_stack;
LARGE_INTEGER _timeout;
IO_STATUS_BLOCK io_status; <-- see note 2
if (max_timeout > 0 && timeout > max_timeout)
{
timeout = max_timeout;
}
if (timeout <= 0)
timeout = LIBUSB_MAX_CONTROL_TRANSFER_TIMEOUT;
KeInitializeEvent(&event, NotificationEvent, FALSE);
irp = IoBuildDeviceIoControlRequest(control_code, dev->target_device, \
NULL, 0, NULL, 0, TRUE, > see note 3
NULL, &io_status); /
if (!irp)
{
return STATUS_NO_MEMORY;
}
next_irp_stack = IoGetNextIrpStackLocation(irp);
next_irp_stack->Parameters.Others.Argument1 = urb;
next_irp_stack->Parameters.Others.Argument2 = NULL;
IoSetCompletionRoutine(irp, on_usbd_complete, &event, TRUE, TRUE, TRUE);
status = IoCallDriver(dev->target_device, irp); \_ returns STATUS_PENDING so need to wait for irp completion
if(status == STATUS_PENDING) /
{
_timeout.QuadPart = -(timeout * 10000);
if(KeWaitForSingleObject(&event, Executive, KernelMode, \_ wait times-out after specified 500ms -- see note 4
FALSE, &_timeout) == STATUS_TIMEOUT) /
{
USBERR0("request timed out\n");
IoCancelIrp(irp); <-- attempt to cancel the irp because of the time-out
}
}
/* wait until completion routine is called */
KeWaitForSingleObject(&event, Executive, KernelMode, FALSE, NULL); <-- wait (with infinite time-out) for completion
status = irp->IoStatus.Status; <-- see note 5
/* complete the request */
IoCompleteRequest(irp, IO_NO_INCREMENT); <-- see note 6
USBDBG("status = %08Xh\n",status);
return status; <-- see note 7
}
Note 1:
The call_usbd_ex() routine executes all the time but precisely every 60sec there is a unique usb request that always times-out after 500ms and fails:
2: kd> dt _URB ffff868d`0be18570 UrbControlVendorClassRequest.Value
USBPORT!_URB
+0x000 UrbControlVendorClassRequest :
+0x082 Value : 0x3fd <-- 0x3fd = 1021 is a unique usb vendor request code that always times-out
Eaton should be able to tell us what this 0x3fd (1021) request is, why it gets sent every 60sec, and what might cause it to always time-out.
We should check a “working” system to see if this request is being sent and if it always time-out and fails.
If this request didn’t get sent or if it didn’t time-out and fail then we would not have a problem.
Note 2:
The 2 qword IO_STATUS_BLOCK io_status is allocated as a local on the stack of the current routine:
2: kd> dt IO_STATUS_BLOCK Status Information
libusb0!IO_STATUS_BLOCK
+0x000 Status : Int4B <-- low dword of first qword will be updated with ntstatus 0xc0000120 (The I/O request was canceled)
+0x008 Information : Uint8B <-- second qword will be updated with the byte count of the transfer which will be 0x00000000’00000000
This is what gets updated or stomped on the stack and the results are:
- If the writes happen before this routine returns then this is just a normal status block update
- If the writes happen after this routine returns then this stomps the stack possibly of some other routine and timing / stack usage determines if it crashes
Note 3:
The irp is created here. Notice that the &io_status ptr is passed as an arg and it becomes the irp member UserIosb ptr:
2: kd> dt nt!_IRP IoStatus UserIosb
+0x030 IoStatus : _IO_STATUS_BLOCK
+0x048 UserIosb : Ptr64 _IO_STATUS_BLOCK <-- ptr to local io_status allocated on this routine’s stack
The irp that’s created has the following flags:
2: kd> dt nt!_IRP 0xffffe486`c1d56990 Flags
+0x010 Flags : 0x40060000 <-- 0x40000000 => FILE_FLAG_OVERLAPPED => it can be asynchronous (or synchronous)
When the kernel is later called to complete the request, it copies IoStatus into the ptr contents of *UserIosb (ie. io_status) which is the final status info for the irp:
- This is what can stomp the stack of another routine depending upon when it happens
- ie. it might be a synchronous or asynchronous completion
- if it is async then the stack will get stomped later but there will only be a crash if another routine is actively using those stack addrs at that time
Either of the following will work-around the problem: - If io_status is defined as a global (instead of as a local on the stack) then the stack will not get stomped
- If io_status is replaced with a NULL then the kernel will not update user status (*UserIosb) and the stack will not get stomped
o Note: I don’t think it is documented that this ptr can be a NULL ptr but I verified that this works (using kd to hack the assembly code)
Note 4:
This times-out after a specified 500ms wait/time-out (ie. the request/irp event does not get signaled within 500ms).
If I hack the assembly with kd to make it an infinite wait/time-out then it works:
- This means that this usb request (ie. the specific one that happens every 60sec) is taking far too long
- Eaton should be able to tell us what this vendor _URB request (value 0x3fd / 1021) is and why it takes so long
- We should also check a “working” system to see if the same request is being sent every 60sec and if it also times-out
- ie. if this request is not being sent and/or not timing-out (or we can stop it) then the system will not fail
Note 5:
The “status = irp->IoStatus.Status” should really be “status = irp->UserIosb->Status” (where the final status is copied into) but this doesn’t cause a problem.
It does mean that irp->UserIosb is not being used here.
Note 6:
This call (IoCompleteRequest) tells the kernel to finish and do clean-up of the irp. This includes the kernel executing nt!IopCompleteRequest to copy IoStatus into *UserIosb:
- this can happen synchronously by direct calls (w/o an APC int) as shown in one of the callstacks earlier in this email
- this can also happen asynchronously sometime later via an APC int as shown in another of the callstacks earlier in this email
Note 7:
The call_usbd_ex routine returns here and the locals on its stack should not be accessed from this point onwards:
- if the irp was handled synchronously then io_status on the current stack was previously written and will not be accessed again => no problem
- if the irp is being handled asynchronously then io_status on the stack might be written sometime after the return (on some other routine’s stack):
o if the APC int and the write of io_status happens before the penter() check then the stack is stomped w/o problems because the addr is not yet in use
o if the APC int and the write of io_status happens after the penter() check then the stack is stomped w/o problems if the addr is not currently used
o if the APC int and the write of io_status happens near the penter() check then the args being passed up the stack during that time window get stomped
this last case is where we see 1 or 2 stomped args on the stack causing a penter trap or crash
ie. stack addr of old qword io_status.Information stomped w/ 0 and old dword io_status.Status stomped w/ ntstatus 0xc0000120
Best Regards,
James
Sorry but I do not see any emails in libusb-win32 mailing list. You need to subscribe to the mailing list in order to post.
@dontech Just wondering if you have some time to take a look whether this is a real issue or not.
@JamesHuangHPE
Have you tried out the latest snapshot version here? Take note it is unsigned version so you may have to use a test machine and enable test signing.
https://sourceforge.net/projects/libusb-win32/files/libusb-win32-snapshots/20190918/
https://docs.microsoft.com/en-us/windows-hardware/drivers/install/the-testsigning-boot-configuration-option
Just wondering if it is possible for you to change to libusbk driver. It is difficult for use to support libusb-win32 driver now.
Even for libusbk driver, we are shipping the old version because of the new Windows 10 kernel driver requirements. We have switched the focus on the libusbk.dll and not updating the libusbk.sys. WinUSB driver should be the way to go.
@JamesHuangHPE
Have you tried out the latest snapshot version here? Take note it is unsigned version so you may have to use a test machine and enable test signing.
https://sourceforge.net/projects/libusb-win32/files/libusb-win32-snapshots/20190918/
https://docs.microsoft.com/en-us/windows-hardware/drivers/install/the-testsigning-boot-configuration-optionJust wondering if it is possible for you to change to libusbk driver. It is difficult for use to support libusb-win32 driver now.
The issue also existing in latest version.
@JamesHuangHPE I will strongly encourage you to try out libusbk driver to see if that helps. libusbk.sys is almost a drop in replacement for libusb0.sys for most of the devices. You do not need to change your application as libusb0.dll is still working with libusbk.sys.
You can use the libusbk-inf-wizard to replace the driver.
Ref: libusbk.sys vs libusb0.sys vs WinUSB
http://libusbk.sourceforge.net/UsbK3/usbk_comparisons.html
Any update on this?
James, did you try the code i posted?
Hi Dontech,
Where did you post the code?
Is there any link?
Thank you.
James
Hi James,
When I meant "the code" I meant the suggestion above which I wrote on Aug 15.
I will need to modify the driver to try it, but since you already have done that before, it should not be a problem.
Thanks,
/pedro
hi Pedro,
Could you please modify the driver and provide it to us for testing?
I did not do the modification before, it was done by other engineer before.
Thank you.
James
Hi James,
I am not sure when or if i have the time for this. Furthermore, i am not sure i can reproduce the problem here.
I will see what i can do, but i cannot guarantee when i will have time for this.
If you need it done within a reasonable time-frame, i would suggest you allocate some resources for it.
Thanks,
/pedro
Hi James,
-
Please review the changes i have made thoroughly.
-
Please try these binaries. You will need to disable driver signing requirements or re-sign them for testing (using your own inf).
Thanks,
/pedro
James, please update this.
Hi Peter,
The HPE SW expert is reviewing the modification.
should have some update later.
Thank you.
James
Hi James,
Any update on this? Does the fix work?
Hi Peter,
Sorry for the late reply, we are seeking some other way to fix this issue.
I will provide more response once the solution is confirmed.
Thank you for your support.
Best Regards,
James
OK, but any update on why this does not work would be appreciated.
@JamesHuangHPE
Please try the latest snapshot and check whether this issue has been fixed. Thanks.
James, could you/HPE report back on this?
Lets get this one closed, so we are sure we have it right for the next release.
Hi Peter,
Sorry for the late response.
We found other solution on the application software, fix the URB request timeout problem to cause the USB driver enter error path to cause this issue.
Thank you for you update, and please close this case.
Best Regards,
James
OK The fix is still valid however, as the driver is now more safe from BSODs.
Thanks,
/pedro
Hi Peter,
Thank you for all your response and driver updated.
Really appreciate your support.
Best Regards,
James