intel/lmbench

lmbench takes too long for large nvme disks

Opened this issue · 1 comments

Ran lmbench after obtaining source from https://github.com/intel/lmbench/ and getting binary after compilation.

Following command was used to run lmbench:
#lmbench

Following is the config file content :

DISKS=""
DISK_DESC=""
OUTPUT="/dev/tty"
ENOUGH=5000
FASTMEM="NO"
FILE="/usr/tmp/XXX"
FSDIR="/usr/tmp"
INFO=INFO.myserver.com
LINE_SIZE=128
LOOP_O=0.00000030
MAIL=no
TOTAL_MEM=509856.46875
MB=407885
MHZ="1494 MHz, 0.6693 nanosec clock
"
MOTHERBOARD=
NETWORKS=
OS=x86_64-Linux
PROCESSORS=40
REMOTE=
SLOWFS="NO"
SYNC_MAX="1"
LMBENCH_SCHED="DEFAULT"
TIMING_O=0
RSH=rsh
RCP=rcp
VERSION=3.0-20100921

BENCHMARK_HARDWARE=NO
BENCHMARK_OS=NO
BENCHMARK_SYSCALL=NO
BENCHMARK_SELECT=NO
BENCHMARK_SIG=NO
BENCHMARK_PROC=NO
BENCHMARK_CTX=NO
BENCHMARK_PAGEFAULT=NO
BENCHMARK_FILE=NO
BENCHMARK_MMAP=NO
BENCHMARK_PIPE=NO
BENCHMARK_UNIX=NO
BENCHMARK_UDP=NO
BENCHMARK_TCP=NO
BENCHMARK_CONNECT=NO
BENCHMARK_RPC=NO
BENCHMARK_HTTP=NO
BENCHMARK_BCOPY=NO
BENCHMARK_MEM=NO
BENCHMARK_OPS=NO
DISKS=/dev/nvme0n1p2
DISK_DESC="none"


With large size nvme disks (2TB or more) lmbench sometimes goes for hours (even for more than a day) , and gets stuck at "Calculating disk zone bw & seek times" of output.
This unusual long time for lmbench completion is not seen with non-nvme disks. This is seen only with large size nvme disk (in TBs).

Got latest source which was updated recently and ran disk binary with a disk of size 5.8 T.

[root@localhost ]# lsblk | grep nvme0n1
nvme0n1 259:0 0 5.8T 0 disk

[root@localhost SOURCES]# disk /dev/nvme0n1

The above command does not get completed even after hours. The size of result file grows too big.
Calculating disk zone bandwidth takes too long.

Making code changes as mentioned in diff below works fine for all disk sizes.

[root@localhost lmbench-master]# diff -Nrup orig_disk.c src/disk.c
--- orig_disk.c 2019-10-16 02:45:06.193140852 -0400
+++ src/disk.c  2019-10-16 04:49:34.824774418 -0400
@@ -49,7 +49,7 @@ zone(char *disk, int oflag, int bsize)
        int     n;
        int     fd;
        uint64  off;
-       int     stride;
+       uint64  stride;

        if ((fd = open(disk, oflag)) == -1) {
                perror(disk);
@@ -88,8 +88,8 @@ zone(char *disk, int oflag, int bsize)
        if (bsize > stride) stride = bsize;

        off *= ZONEPOINTS;
-       debug((stdout, "stride=%d bs=%d size=%dM points=%d\n",
-           stride, bsize, (int)(off >> 20), (int)(off/stride)));
+       debug((stdout, "stride=%u bs=%d size=%uM points=%u\n",
+           stride, bsize, (uint64)(off >> 20), (uint64)(off/stride)));

        /*
         * Read buf's worth of data every stride and time it.
@@ -142,12 +142,12 @@ seek(char *disk, int oflag)
 {
        char    *buf;
        int     fd;
-       off64_t size;
-       off64_t begin, end;
+       uint64  size;
+       uint64  begin, end;
        int     usecs;
        int     error;
        int     tot_msec = 0, tot_io = 0;
-       int     stride;
+       uint64  stride;

        if ((fd = open(disk, oflag)) == -1) {
                perror(disk);
@@ -174,8 +174,8 @@ seek(char *disk, int oflag)
        stride >>= 9;
        stride <<= 9;

-       debug((stdout, "stride=%d size=%dM points=%d\n",
-           stride, (int)(size >> 20), (int)(size/stride)));
+       debug((stdout, "stride=%u size=%uM points=%u\n",
+           stride, (uint64)(size >> 20), (uint64)(size/stride)));

        end = size;
        begin = 0;