microsoft/diskspd

New-Fleet - "Cannot connect CIM server" since upgrading to 2.0.2.1

gjvanzyl opened this issue · 11 comments

HCI01A: Cannot connect to CIM server. WinRM cannot process the request. The following error with errorcode 0x8009030e occurred while using Kerberos authentication: A specified logon session does not exist. It may
already have been terminated.
Possible causes are:
-The user name or password specified are invalid.
-Kerberos is used when no authentication method and no user name are specified.
-Kerberos accepts domain user names, but not local user names.
-The Service Principal Name (SPN) for the remote computer name and port does not exist.
-The client and remote computers are in different domains and there is no trust between the two domains.
After checking for the above issues, try the following:
-Check the Event Viewer for events related to authentication.
-Change the authentication method; add the destination computer to the WinRM TrustedHosts configuration setting or use HTTPS transport.
Note that computers in the TrustedHosts list might not be authenticated.
-For more information about WinRM configuration, run the following command: winrm help config.
+ CategoryInfo : ResourceUnavailable: (MSFT_Volume:String) [Get-Volume], CimJobException
+ FullyQualifiedErrorId : CimJob_BrokenCimSession,Get-Volume
+ PSComputerName : HCI01C

HCI01A: Cannot connect to CIM server. WinRM cannot process the request. The following error with errorcode 0x8009030e occurred while using Kerberos authentication: A specified logon session does not exist. It may
already have been terminated.
Possible causes are:
-The user name or password specified are invalid.
-Kerberos is used when no authentication method and no user name are specified.
-Kerberos accepts domain user names, but not local user names.
-The Service Principal Name (SPN) for the remote computer name and port does not exist.
-The client and remote computers are in different domains and there is no trust between the two domains.
After checking for the above issues, try the following:
-Check the Event Viewer for events related to authentication.
-Change the authentication method; add the destination computer to the WinRM TrustedHosts configuration setting or use HTTPS transport.
Note that computers in the TrustedHosts list might not be authenticated.
-For more information about WinRM configuration, run the following command: winrm help config.
+ CategoryInfo : ResourceUnavailable: (MSFT_Volume:String) [Get-Volume], CimJobException
+ FullyQualifiedErrorId : CimJob_BrokenCimSession,Get-Volume
+ PSComputerName : HCI01B

I can confirm that I'm able to do "Get-Volume -CIMSession NODENAME" to both my other two nodes in the 3 node cluster without issue, so WinRM with Kerberos auth seems to be working fine. Also able to Enter-PSSession to both other nodes without issue. Its only the the New-Fleet failing when enumerating the remote nodes. Local node still deploys its VMs + Disks

dl2n commented

Where is New-Fleet running with respect to the cluster (on a cluster node, or remote) and is New-Fleet running in an already-remoted session (e.g., Enter-PSSession, Invoke-Command -ComputerName foo { New-Fleet })? Can you provide the full command line?

On one of the cluster nodes. Tried running it on either of the 3, they throw the same error.

Full command:

New-Fleet -basevhd "C:\ClusterStorage\collect\tools\gold.vhdx" -adminpass PASSWORD -connectuser DOMAIN\USERNAME -connectpass DOMAINPASSWORD

All other CIM/WinRM remoting is working fine. Command itself was working fine on all nodes before upgrade.

Any suggestions @dl2n?

I purged everything now.

Remove-VMFleet
Deleted everything in the "collect" CSV
Uninstall-Module VMFleet -Force
Install-Module VMFleet
Install-Fleet -Force
Copied "gold" image to C:\ClusterStorage\collect\Tools\gold.vhdx
New-Fleet -basevhd "C:\ClusterStorage\collect\tools\gold.vhdx" -adminpass goldpassword -connectuser domain\user -connectpass domainpassword

Same issue. I have 3 nodes HCI01A, HCI01B, HCI01C. Fleet module is installed on all 3 nodes. When I execute New-Fleet from HCI01B (or any for that matter, obviously doing a remove-fleet first) it throws these errors and then only starts creating the VMs on HCI01A and not the executing node HCI01B or the other HCI01C. I can do the -VM switch to create 3x the VMs (48 Cores per Node, so 144 VMs) and then use Move-VMStorage to migrate the relevant VMs to the other CSVs for testing but its a very tedious process. I can confirm all other CIM WinRM remoting is working fine like Get-Volume -CIMSession from Any node to Any another node or even Enter-PSSession -ComputerName from Any node to any other node.

New-Fleet output:

HCI01A: Cannot connect to CIM server. WinRM cannot process the request. The following error with errorcode 0x8009030e occurred while using Kerberos authentication: A specified logon session does not exist. It may
already have been terminated.
Possible causes are:
-The user name or password specified are invalid.
-Kerberos is used when no authentication method and no user name are specified.
-Kerberos accepts domain user names, but not local user names.
-The Service Principal Name (SPN) for the remote computer name and port does not exist.
-The client and remote computers are in different domains and there is no trust between the two domains.
After checking for the above issues, try the following:
-Check the Event Viewer for events related to authentication.
-Change the authentication method; add the destination computer to the WinRM TrustedHosts configuration setting or use HTTPS transport.
Note that computers in the TrustedHosts list might not be authenticated.
-For more information about WinRM configuration, run the following command: winrm help config.
+ CategoryInfo : ResourceUnavailable: (MSFT_Volume:String) [Get-Volume], CimJobException
+ FullyQualifiedErrorId : CimJob_BrokenCimSession,Get-Volume
+ PSComputerName : HCI01B

create vm vm-base-HCI01A-001 @ path C:\ClusterStorage\HCI01A\vm-base-HCI01A-001 with vhd C:\ClusterStorage\HCI01A\vm-base-HCI01A-001\vm-base-HCI01A-001.vhdx
HCI01A: Cannot connect to CIM server. WinRM cannot process the request. The following error with errorcode 0x8009030e occurred while using Kerberos authentication: A specified logon session does not exist. It may
already have been terminated.
Possible causes are:
-The user name or password specified are invalid.
-Kerberos is used when no authentication method and no user name are specified.
-Kerberos accepts domain user names, but not local user names.
-The Service Principal Name (SPN) for the remote computer name and port does not exist.
-The client and remote computers are in different domains and there is no trust between the two domains.
After checking for the above issues, try the following:
-Check the Event Viewer for events related to authentication.
-Change the authentication method; add the destination computer to the WinRM TrustedHosts configuration setting or use HTTPS transport.
Note that computers in the TrustedHosts list might not be authenticated.
-For more information about WinRM configuration, run the following command: winrm help config.
+ CategoryInfo : ResourceUnavailable: (MSFT_Volume:String) [Get-Volume], CimJobException
+ FullyQualifiedErrorId : CimJob_BrokenCimSession,Get-Volume
+ PSComputerName : HCI01C

specialize C:\ClusterStorage\HCI01A\vm-base-HCI01A-001\vm-base-HCI01A-001.vhdx
create vm vm-base-HCI01A-002 @ path C:\ClusterStorage\HCI01A\vm-base-HCI01A-002 with vhd C:\ClusterStorage\HCI01A\vm-base-HCI01A-002\vm-base-HCI01A-002.vhdx
specialize C:\ClusterStorage\HCI01A\vm-base-HCI01A-002\vm-base-HCI01A-002.vhdx
create vm vm-base-HCI01A-003 @ path C:\ClusterStorage\HCI01A\vm-base-HCI01A-003 with vhd C:\ClusterStorage\HCI01A\vm-base-HCI01A-003\vm-base-HCI01A-003.vhdx

I've got the exact same problem.

dl2n commented

I've isolated the issue and am testing the fix. Release will probably be Monday 11/23 to GitHub, shortly thereafter to PS Gallery.

@gjvanzyl @ZoomImpulse

If you cannot wait for the changes of @dl2n you could remove the ' -CimSession $node' on line 1147 in VMFleet.psm1.
Don't forget to re-import the module.

@DarrylvanderPeijl
Thanks for the hint. In the meantime I downgraded VMFleet to 2.0.0.1 instead, using "Install-Module VMFleet -RequiredVersion 2.0.0.1".

@gjvanzyl @ZoomImpulse

If you cannot wait for the changes of @dl2n you could remove the ' -CimSession $node' on line 1147 in VMFleet.psm1. Don't forget to re-import the module.

Worked like a charm. Thank you!

dl2n commented

This should be fixed in 2.0.2.2. The node job parallelization in new-fleet cannot itself remote again; this runs into kerberos delegation issues.

@dl2n Could you please release 2.0.2.2 to PSGallery? Thanks!