virtio-win/kvm-guest-drivers-windows

Balloon Service 0x80041032

Svra44 opened this issue · 11 comments

Balloon Service is generating the following WMI errorID:5858
"
Id = {00000000-0000-0000-0000-000000000000}; ClientMachine = HOSTNAME; User = NT AUTHORITY\SYSTEM; ClientProcessId = 964; Component = Unknown; Operation = Start IWbemServices::ExecQuery - root\cimv2 : SELECT * FROM Win32_PerfRawData_PerfOS_Memory; ResultCode = 0x80041032; PossibleCause = Unknown
"

Version: virtio-win-0.1.204-1

Host OS: Proxmox and ArchLinux

Guest OS: Windows Server 2016 (proxmox) & Windows 10 (archlinux)

On Windows 10, the WMI service was also spiking CPU Usage due to the WMI error. Stopping the Balloon service or restarting "Windows Management Instrumentation" service stops the CPU spikes until reboot.
Manually running "Get-WMIObject Win32_PerfRawData_PerfOS_Memory" from elevated powershell returns the expected output & doesnt cause any errorID 5858

Following KB indicates issue could be related to lack of timeout: https://docs.microsoft.com/en-us/troubleshoot/windows-client/system-management-components/wmi-activity-event-5858-logged-with-resultcode-0x80041032

@Svra44
Hi, I want to ask the detailed steps to reproduce this issue, in my env I didn't reproduce it.
Host: RHEL8
Guest:Win10-64(10.0.19043)
virtio-win-0.1-204

Steps:

  1. Boot up vm with balloon device, machine type is q35.
  2. Install balloon driver and start balloon service
    blnsvr.exe -i
  3. enable polling in qmp:
    {"execute":"qmp_capabilities"}
    {"execute": "qom-set","arguments": { "path":
    "/machine/peripheral/balloon0","property": "guest-stats-polling-interval",
    "value": 2 } }
    { "execute": "qom-get","arguments": { "path":
    "/machine/peripheral/balloon0","property": "guest-stats" } }
  4. Balloon memory via qmp
  5. Check EVENT VIEWER in guest, didn't find any error log

@xiagao ,
There arent any specific steps to reproduce. I am seeing it on each of the VMs Ive checked so far. They would have been standard 10/Server2016 installs and then having virtio-win-gt-x64 0.1.173 installed.

Initially I identified the error because of a system having performance issues (running 0.1.173 blnsvr installed via virtio-win-gt-x64.msi I believe), and tracked the issue down to wmiprvse.exe CPU spikes which stopped once blnsvr service was stopped (& the 5858 errors stopped occurring).

Ive also tried uninstalling blnsvr and re-installing using 0.1.204 and have seen no change.
I am also seeing the same on a q35 VM, though most of my VMs are i440fx.

I setup a test VM using q35 with 21H1 19043.928 Windows 10 Pro N iso, copied "Balloon\w10\amd64*" from virtio-win-0.1.204.iso to the C Drive and installed via "blnsvr.exe -i". I didnt see the errors occur, though as per below continuation, that may just be because I didnt wait long enough.
I then uninstalled blnsvr.exe and installed Balloon Service via virtio-win-gt-x64 and began seeing the errors on this new VM, though they took about 5 minutes after initial upstart of blnsvr.exe to start appearing in Microsoft-Windows-WMI-Activity, and seem to occur less frequently(~1 minute in intervals) than some other VMs I have checked.
Checking the file SHA1 hashes virtio-win-gt-x64 installs the correct w10 files, however it doesnt include the .pdb files (though manually copying them over doesnt change the behaviour).
Uninstalling Virtio-win-driver and re-running "blnsvr.exe -i" using the manually copied over files again doesnt seem to produce the 5858 errors. Seems tied to installs via virtio-win-gt-x64?

You would need to specifically check for the error under Microsoft-Windows-WMI-Activity if you havnt already.
The reference "ClientProcessId = 7364" of the below example event also corresponds to blnsvr PID.

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
  <Provider Name="Microsoft-Windows-WMI-Activity" Guid="{1418ef04-b0b4-4623-bf7e-d74ab47bbdaa}" /> 
  <EventID>5858</EventID> 
  <Version>0</Version> 
  <Level>2</Level> 
  <Task>0</Task> 
  <Opcode>0</Opcode> 
  <Keywords>0x4000000000000000</Keywords> 
  <TimeCreated SystemTime="2021-08-02T21:17:13.2462129Z" /> 
  <EventRecordID>670</EventRecordID> 
  <Correlation /> 
  <Execution ProcessID="2892" ThreadID="804" /> 
  <Channel>Microsoft-Windows-WMI-Activity/Operational</Channel> 
  <Computer>DESKTOP</Computer> 
  <Security UserID="S-1-5-18" /> 
</System>
<UserData>
  <Operation_ClientFailure xmlns="http://manifests.microsoft.com/win/2006/windows/WMI">
  <Id>{00000000-0000-0000-0000-000000000000}</Id> 
  <ClientMachine>DESKTOP</ClientMachine> 
  <User>NT AUTHORITY\SYSTEM</User> 
  <ClientProcessId>7364</ClientProcessId> 
  <Component>Unknown</Component> 
  <Operation>Start IWbemServices::ExecQuery - root\cimv2 : SELECT * FROM Win32_PerfRawData_PerfOS_Memory</Operation> 
  <ResultCode>0x80041032</ResultCode> 
  <PossibleCause>Unknown</PossibleCause> 
  </Operation_ClientFailure>
</UserData>
</Event>

@Svra44 ,
Yes, you are right.
Only hit this error when installing Balloon Service via virtio-win-gt-x64.Could not reproduce via 'blnsvr.exe -i'.
But I didn't have CPU spikes when get this error.
We will track this issue internally.
Thanks.

I came here trying to track down the 0x80041032 I saw flooding my event log. As a datapoint, Microsoft has this to say on the cause of this error:

WMI-Activity Error 5858 with ResultCode = 0x80041032 (WBEM_E_CALL_CANCELLED) indicates that the WMI caller has successfully issued IWbemServices:ExecQuery, but has released the IWbemContext object before retrieving the full result set using the IEnumWbemClassObject::Next method. If the WMI service is still holding data for the client when the client terminates the link (by releasing the IWbemContext object), this event will be logged.

(source: https://learn.microsoft.com/en-us/troubleshoot/windows-client/system-management-components/wmi-activity-event-5858-logged-with-resultcode-0x80041032)

I noticed that enumerator does not get enumerator->Release() called before going out of scope (at least, as far as I can see: I'm no C++ guru). Maybe that omission causes the events?

I do not have an environment in which I can build the exe, but I'd be happy to test it if it becomes available.

@kostyanf14 please check the last comment

@driehuis

The enumerator variable

CComPtr< IEnumWbemClassObject > enumerator;
is a CComPtr object. CComPtr is a smart pointer. According to Microsoft docs, we should not call enumerator->Release()

Ah, so it was my limited C++ knowledge after all :-) I'll see if I can build the binary and toy with it to see if I can isolate the problem, seeing that I seem to be able to reproduce the issue easily.

@driehuis

You need Windows 10+/Windows Server 2016+ and EWDK Windows 11 21H2 (ISO) to build a binary. Detailed instructions are in the wiki: https://github.com/virtio-win/kvm-guest-drivers-windows/wiki/Building-the-drivers-using-Windows-11-21H2-EWDK

Got it built. Unfortunately, the reboot involved in getting the environment set up seems to have cleared the condition, so I'll first have to wait for the issue to be visible again. Dang.

The problems I had reproducing the issue were caused by delays in my logging pipeline, so ignore the presumed "clearing of the condition". The Event Viewer shows the issue immediately in the WMI-Activity channel, if you know where to find it. I added some code to consume the entire enumerator, and the problem goes away. If I run my home-built blnsvr.exe without the patch, I reproducibly get a ton of events.

I can only assume that the WMI API only cleans up its internal context after the entire enumerator is consumed, and randomly throws a hissy fit about it in the event log or not depending on timing.

I'm an old fashioned K&R C coder, so please review pull request #939 without assuming I will do anything right without prodding :-) I have not been able to do any formal testing either.

Errrm, scratch that. The problem is back in a different incarnation:
Id = {BD897EE8-24C6-4355-891D-EE51E446AD7D}; ClientMachine = PLB-WS001; User = NT AUTHORITY\SYSTEM; ClientProcessId = 7760; Component = Unknown; Operation = Start IWbemServices::DeleteInstance - Root\Rsop\Computer : RSOP_ExtensionStatus.extensionGuid="{BC75B1ED-5833-4858-9BB8-CBF0B166DF9D}"; ResultCode = 0x80041002; PossibleCause = Unknown

Update: this spammage occurs when the system is going down for a restart. Not even sure that PID 7760 was the blnsvr at the time. There are also other tools spamming the WMI event log. So, the diff still seems to be okay. Sorry for crying wolf...