spdk_perf Performance
YuanWangZai opened this issue · 6 comments
Hello, I've been using nvmevirt recently to build SSD devices for some I/O API research. When using CONFIG_NVMEVIRT_SSD, testing a device with spdk_perf across multiple threads yields similar IOPS (consistent with the description from SPDK official documentation).
However, when I use nvmevirt to build with the CONFIG_NVMEVIRT_NVM option, the results from spdk_perf testing noticeably decrease as the number of threads increases, and I haven't found any relevant explanations.
Before reaching out to the SPDK community for assistance, I want to confirm if this is possibly due to the emulator configuration. Thanks again.
Could you let us know more details about your NVMeVirt configurations?
Also, could you please share your spdk perf configurations?
Thanks for your reply. I haven't modified any NVMeVirt code. Due to limited memory space, I allocated 4GB of space for simulating SSD. I'm using the current master branch of SPDK. Below is the script I used for testing with spdk perf:
declare -a num_threads=("1" "2" "3" "4" "5" "6" "7" "8")
declare -a spdk_mask=("palceholder" "80" "c0" "e0" "f0" "f8" "fc" "fe" "ff")
for t in "${num_threads[@]}"
do
$SPDK_PERF_PATH -q 128 -o 4096 -w randread -r 'trtype:PCIe traddr:0001.10.00.0' -t 20 -a 5 -c ${spdk_mask[$t]} > $RESULT/spdk_perf_t_${t}.txt;
done
Here are some raw output results of CONFIG_NVMEVIRT_NVM option:
- num_threads=1
Initializing NVMe Controllers
Attached to NVMe Controller at 0001:10:00.0 [0c51:0110]
Associating PCIE (0001:10:00.0) NSID 1 with lcore 7
Initialization complete. Launching workers.
========================================================
Latency(us)
Device Information : IOPS MiB/s Average min max
PCIE (0001:10:00.0) NSID 1 from core 7: 1313928.30 5132.53 97.41 5.27 4017.90
========================================================
Total : 1313928.30 5132.53 97.41 5.27 4017.90
- num_threads=2
Initializing NVMe Controllers
Attached to NVMe Controller at 0001:10:00.0 [0c51:0110]
Associating PCIE (0001:10:00.0) NSID 1 with lcore 6
Associating PCIE (0001:10:00.0) NSID 1 with lcore 7
Initialization complete. Launching workers.
========================================================
Latency(us)
Device Information : IOPS MiB/s Average min max
PCIE (0001:10:00.0) NSID 1 from core 6: 632279.44 2469.84 202.43 176.45 1206.29
PCIE (0001:10:00.0) NSID 1 from core 7: 632282.84 2469.85 202.43 172.62 1208.60
========================================================
Total : 1264562.27 4939.70 202.43 172.62 1208.60
- num_threads=3
Initializing NVMe Controllers
Attached to NVMe Controller at 0001:10:00.0 [0c51:0110]
Associating PCIE (0001:10:00.0) NSID 1 with lcore 5
Associating PCIE (0001:10:00.0) NSID 1 with lcore 6
Associating PCIE (0001:10:00.0) NSID 1 with lcore 7
Initialization complete. Launching workers.
========================================================
Latency(us)
Device Information : IOPS MiB/s Average min max
PCIE (0001:10:00.0) NSID 1 from core 5: 374736.00 1463.81 341.58 86.08 4000.14
PCIE (0001:10:00.0) NSID 1 from core 6: 374783.30 1464.00 341.51 50.02 4011.29
PCIE (0001:10:00.0) NSID 1 from core 7: 374770.00 1463.95 341.52 82.66 4011.86
========================================================
Total : 1124289.30 4391.76 341.53 50.02 4011.86
As the number of threads increases, the average IOPS per core decreases compared to single-threaded performance.
Thank you for the detailed description.
Unfortunately, I have failed to reproduce your results.
My results are
Numjobs | IOPS | MiB/s | Average | min | max |
---|---|---|---|---|---|
thread1 | 5360.91 | 20.94 | 23914.86 | 15962.04 | 24088.36 |
thread2 | 128370.85 | 501.45 | 1995.05 | 585.17 | 35979.73 |
thread3 | 150069.05 | 586.21 | 2559.63 | 1256.68 | 47894.08 |
thread4 | 166449.35 | 650.19 | 3077.34 | 1930.38 | 35954.97 |
thread5 | 176229.7 | 688.4 | 3634.08 | 1954.18 | 27931.1 |
thread6 | 182166.35 | 711.59 | 4217.95 | 1351.2 | 35948.03 |
thread7 | 186052 | 726.77 | 4818.59 | 2118.34 | 40029.51 |
thread8 | 186064.9 | 726.82 | 5507.42 | 1447.77 | 43895.2 |
There are some additional settings that might be needed for you.
-
Set the CPU affinity for spdk_nvme_perf.
It is important to make sure that spdk_nvme_perf's thread is not using NVMeVirt's core.
Could you run the test with CPU affinity applied? You can see the example code at eval_nvmev.sh -
Set the NVM SSD performance
Unlike Conventional SSD which has the targeting performance parameters inside the code, NVM SSD needs performance parameters passed from the user to emulate real Optane SSD.
You can use the set_perf.py to set NVMeVirt's performance on-the-fly.
(FYI, the configuration we used isset_perf.py 12 14 2400 2000
)
Hello, thank you for your detailed suggestions. I made sure that spdk_nvme_perf doesn't utilize NVMevirt cores. However, initially, I overlooked the need to manually configure the performance of NVM SSDs. I was using default parameters, which seemed to be 1ns. After using set_perf.py 12 14 2400 2000, the test results of spdk_nvme_perf were normal. However, when exploring delays at the ns level during multi-threaded runs, spdk_nvme_perf's results didn't meet expectations. Additionally, I found that setting the delay to the default 1ns could potentially lead to errors during runtime, such as: nvme_pcie_common.c:657:nvme_pcie_qpair_submit_tracker: ERROR: sq_tail is passing sq_head!
First, sorry for my late response.
The default delay of 1ns means that the SSD's performance would depend on the actual data transfer time.
(Data transfer time would vary depending on what you have used (memcpy or DMA engine).)
Therefore, when running experiments, the SSD's performance would not be 1ns.