lmbench takes too long for large nvme disks
Opened this issue · 1 comments
Ran lmbench after obtaining source from https://github.com/intel/lmbench/ and getting binary after compilation.
Following command was used to run lmbench:
#lmbench
Following is the config file content :
DISKS=""
DISK_DESC=""
OUTPUT="/dev/tty"
ENOUGH=5000
FASTMEM="NO"
FILE="/usr/tmp/XXX"
FSDIR="/usr/tmp"
INFO=INFO.myserver.com
LINE_SIZE=128
LOOP_O=0.00000030
MAIL=no
TOTAL_MEM=509856.46875
MB=407885
MHZ="1494 MHz, 0.6693 nanosec clock
"
MOTHERBOARD=
NETWORKS=
OS=x86_64-Linux
PROCESSORS=40
REMOTE=
SLOWFS="NO"
SYNC_MAX="1"
LMBENCH_SCHED="DEFAULT"
TIMING_O=0
RSH=rsh
RCP=rcp
VERSION=3.0-20100921
BENCHMARK_HARDWARE=NO
BENCHMARK_OS=NO
BENCHMARK_SYSCALL=NO
BENCHMARK_SELECT=NO
BENCHMARK_SIG=NO
BENCHMARK_PROC=NO
BENCHMARK_CTX=NO
BENCHMARK_PAGEFAULT=NO
BENCHMARK_FILE=NO
BENCHMARK_MMAP=NO
BENCHMARK_PIPE=NO
BENCHMARK_UNIX=NO
BENCHMARK_UDP=NO
BENCHMARK_TCP=NO
BENCHMARK_CONNECT=NO
BENCHMARK_RPC=NO
BENCHMARK_HTTP=NO
BENCHMARK_BCOPY=NO
BENCHMARK_MEM=NO
BENCHMARK_OPS=NO
DISKS=/dev/nvme0n1p2
DISK_DESC="none"
With large size nvme disks (2TB or more) lmbench sometimes goes for hours (even for more than a day) , and gets stuck at "Calculating disk zone bw & seek times" of output.
This unusual long time for lmbench completion is not seen with non-nvme disks. This is seen only with large size nvme disk (in TBs).
Got latest source which was updated recently and ran disk binary with a disk of size 5.8 T.
[root@localhost ]# lsblk | grep nvme0n1
nvme0n1 259:0 0 5.8T 0 disk
[root@localhost SOURCES]# disk /dev/nvme0n1
The above command does not get completed even after hours. The size of result file grows too big.
Calculating disk zone bandwidth takes too long.
Making code changes as mentioned in diff below works fine for all disk sizes.
[root@localhost lmbench-master]# diff -Nrup orig_disk.c src/disk.c
--- orig_disk.c 2019-10-16 02:45:06.193140852 -0400
+++ src/disk.c 2019-10-16 04:49:34.824774418 -0400
@@ -49,7 +49,7 @@ zone(char *disk, int oflag, int bsize)
int n;
int fd;
uint64 off;
- int stride;
+ uint64 stride;
if ((fd = open(disk, oflag)) == -1) {
perror(disk);
@@ -88,8 +88,8 @@ zone(char *disk, int oflag, int bsize)
if (bsize > stride) stride = bsize;
off *= ZONEPOINTS;
- debug((stdout, "stride=%d bs=%d size=%dM points=%d\n",
- stride, bsize, (int)(off >> 20), (int)(off/stride)));
+ debug((stdout, "stride=%u bs=%d size=%uM points=%u\n",
+ stride, bsize, (uint64)(off >> 20), (uint64)(off/stride)));
/*
* Read buf's worth of data every stride and time it.
@@ -142,12 +142,12 @@ seek(char *disk, int oflag)
{
char *buf;
int fd;
- off64_t size;
- off64_t begin, end;
+ uint64 size;
+ uint64 begin, end;
int usecs;
int error;
int tot_msec = 0, tot_io = 0;
- int stride;
+ uint64 stride;
if ((fd = open(disk, oflag)) == -1) {
perror(disk);
@@ -174,8 +174,8 @@ seek(char *disk, int oflag)
stride >>= 9;
stride <<= 9;
- debug((stdout, "stride=%d size=%dM points=%d\n",
- stride, (int)(size >> 20), (int)(size/stride)));
+ debug((stdout, "stride=%u size=%uM points=%u\n",
+ stride, (uint64)(size >> 20), (uint64)(size/stride)));
end = size;
begin = 0;