microsoft/diskspd

Higher than expected random write IOPS

hazemawadalla opened this issue · 9 comments

I created a script that tests the random read/write performance of SSDs, after after sequential and random preconditioning. The test basically runs these functions in order:

Create_NTFS_Volume
Sequential_Precondition
Random_Precondition
Random_Workload
Sequential_Precondition
Sequential_Workload
Process_XML
Delete_NTFS_Volume

The sequential precondtion flags:
$sp = Start-Process -NoNewWindow -FilePath "$Diskspdpath" -ArgumentList "-b128k -d9000 -o128 -t1 -Suw -w100 -L -c$FileSize $DataFile" -PassThru
$sp.WaitForExit()

The random precondition flags:

$rp = Start-Process -NoNewWindow -FilePath "$Diskspdpath" -ArgumentList "-b4k -d9000 -o32 -t4 -Suw -r -w100 -L  -c$FileSize $DataFile" -PassThru
 $rp.WaitForExit()

The random workload flags:
$p = Start-Process -NoNewWindow -FilePath "$Diskspdpath" -ArgumentList "-b$bs -d$Time -o$qdepth -t$thread -Suw -r -Rxml -w$wp -L -c$FileSize $DataFile" -RedirectStandardOutput $stdOutLog -RedirectStandardError $stdErrLog -PassThru
$p.WaitForExit()

However I am getting way higher than expected random write IOPS, way higher than the SSD is capable of. Is there something I am missing in the switches?

dl2n commented

Hi Don,

I have tested not deleting the file on each step after you mentioned it, and I'm still getting high random write numbers. I work for a drive manufacturer and we are using fio for QoS testing. I will attach the fio results (which the drive was speced at), compared to the latest run of my script.

The 4K random write QD 32 with fio is 25400, compared to 81841 IOPS with diskspd. Can you help me understand the discrepancy?

Kingston_SEDC500R7680_7680G_J2.9 2-sample comparison_fiov317.xlsx

diskspdresults.zip

dl2n commented

Hey Dan,
We designed the drive to be a read intensive SSD capable of 99,000 Read/26000 Write IOPS, @ 4k QD32 1 thread.

dl2n commented

I replied with this by the email connector on the 21st, but it appears not to have made it.

I pulled up all of the results (thanks for including the full sweep in XML form!).

Focusing on diskspeedrandom_1_100_4k_32, it looks OK: single thread QD32 random 4KB/4KB 100% unbuffered writethrough write. Load is to a ~7.5TB file which I assume is your device ~fully allocated.

The one thing that occurs to me is that you're using the default non-zero but constant fill pattern for the write buffer source (bytes are 0 - 1 - 2 - 3 .... - 255, repeating). Does your device have intelligence to detect constant buffer fill and optimize/dedup the operations? I'm not sure what FIO's default is, but if it is random or at least a pattern your device may not recognize in the IO path, that may be the difference.

Iff your device does have this intelligence, try this to overcome it: DISKSPD supports creating a random write source buffer with the -Z switch. A size on the order of a few 10's of MiB is usually plenty. In DISKSPD 2.0.17a the write source will be chosen at 4-byte aligned offsets within the buffer, 512-byte aligned in the most recent release to avoid processor architectural effects of sub-cacheline aligned buffers (several % overhead in certain cases).

Last, if you can get the host interface statistics that should strongly narrow down where the disconnect is.

I also ran into the same problem, the results are way higher than what iometer and fio shows.
diskspd.exe -b4K -t16 -r -o16 -d39 -Sh E:\iobw.tst
Ran on EPYC2-7702 and PM1725a, with a 20GB file with pseudo random content created with iometer. All 3 tests are done with the same file.

7702_windows_5x

However the same tests shows similiar results (about 800K IOPS) on Intel platforms.

dl2n commented

@hazemawadalla have likely root caused his issue offline. It has to do with differences in SSD preconditioning methodology between the two specific devices/platforms he was making his comparative runs on. It will take about a week to make the runs to confirm but I suspect that will close his specific issue.

If you open a separate issue, we can see about root causing yours.

@dl2n Thx, I've posted a new issue.

There is not an issue with diskspd per se, but a limitation. SSD preconditioning by capacity is an essential enhancement to diskspd. It is difficult to quantify your SSD performance at steady state vs FOB, especially with larger SSD capacities. With fio (and even other tools, like iometer) you can specify --loops=$ which ensures all LBAs are written to more than once. The SNIA spec recommends 2x capacity, and diskspd has no graceful way of doing that.

There is a workaround, but it is very tedious coding. Keep track of the total bytes written using windows perfmon and stop the random/sequential precondition process once you hit 2x capacity. Another easier workaround, is if you know your device's approximate datarate when performing random or sequential workloads, you can calculate the approximate amount of time it would take to precondition the device.