microsoft/diskspd

Measure-FleetCoreWorkload

MLECOQ56 opened this issue · 5 comments

Hi,

When i started this command, i have this error

VERBOSE: Operation '' complete.
WARNING: .: 2021/09/29-08:45:13.844 Node NCY-HYV1: No disks found to be used for cache
WARNING: .: 2021/09/29-08:45:13.844 Node NCY-HYV2: No disks found to be used for cache
2021-09-29T08:45:14: READ CACHE WARMUP: capacity devices are not caching reads, no warmup to perform
2021-09-29T08:45:19: Starting run for ComputeTemplate=A1v2 DataDiskBytes=153545080832 FleetVMPercent=100 MemoryStartupBytes=2147483648 PowerScheme=HighPerformance ProcessorCount=1 VMAlignmentPct=100 Workload=General4KWriteRatio0
2021-09-29T08:45:20: START Go Epoch: 1
2021-09-29T08:45:20: CLEAR pause at Go
2021-09-29T08:45:23: SLEEP TO RUN CHECK (17.44 seconds)
2021-09-29T08:45:40: RUN CHECK Go Epoch: 1
2021-09-29T08:45:40: ERROR: done-vm-base-NCY-HYV1-004 is already done
2021-09-29T08:45:40: ERROR: done-vm-base-NCY-HYV1-005 is already done
2021-09-29T08:45:40: ERROR: done-vm-base-NCY-HYV1-011 is already done
2021-09-29T08:45:40: RUN CHECK 1 : 3/32 done (0.02s, total 0.02s)
Transcript stopped, output file is C:\ClusterStorage\collect\result\coreworkload.log
Unexpected early completion of load, please check profile and virtual machines for errors
At C:\Program Files\WindowsPowerShell\Modules\vmfleet\2.0.0.1\VMFleet.psm1:4961 char:17

  • ... throw "Unexpected early completion of load, please check ...
  •             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    • CategoryInfo : OperationStopped: (Unexpected earl...ines for errors:String) [], RuntimeException
    • FullyQualifiedErrorId : Unexpected early completion of load, please check profile and virtual machines for error
      s
dl2n commented

Acknowledging.

When this happens, can you please connect to one of those VMs (virtmgmt.msc / vmconnect) and show me what's being reported on the powershell controller's log output?

In the near/medium term I am looking to provide a way to extract the controller logs directly to make this diagnosis easier.

I encountered the same issue and the console show "STOPPING (reason: new run file)".

Error:
image

image

collect folder:
image

Normal VM:
image

Hey guys,

I'm getting the same error, but in different tests and in different VMs. I've already tried with 6, 12 and 16 VMs per node, but it doesn't change anything.
Yesterday I was able to get it to Epoch: 18 or 19, today the test is stopping with this error already at Epoch 2-7.
Has someone any idea how to troubleshoot or solve it?

I add "sleep 10" at line 438 of control.ps1, and now it will not show "XXX is already done" again.
image

dl2n commented

If you can confirm this is occurring with Windows Server 2019 ("RS5"), this is fixed in 2.0.2 - RS5 is now supported and confirmed to work.