openvstorage/alba

huge drop in read performance after playing around with asd-set-slowness

Closed this issue · 2 comments

When all the asds of the cache backend are online we could read 4530KiB/s.
From the moment we introduce latency (asd-set-slowness) to the cache asds the read performance drops to 187KiB/s.

I'm not sure if i configured something wrong but this is not expected i suppose.

The environment is still online and the proxies are on debug level.
ips: 10.100.189.31-33

packages:

ii  alba-ee                              1.5.17                                     amd64        the ALternative BAckend
ii  arakoon                              1.9.22                                     amd64        Simple consistent distributed key/value store
ii  libovsvolumedriver-ee                6.15.2-0                                   amd64        VolumeDriver client library
ii  volumedriver-ee-base                 6.16.6-0                                   amd64        VolumeDriver common infrastructure
ii  volumedriver-ee-server               6.16.6-0                                   amd64        VolumeDriver daemon and helpers

setup:
1 vpool
1 cachebackend
1 hddbackend

cachebackend:
1,0,1,1
fragment size: 16 MiB
No compression
No encryption

hddbackend:
1,2,2,2
fragment_size: 2MiB
no compression
aes-ctr-256

vpool:
2 proxies
write buffer: 1GiB
SCO size: 4 MiB
Cluster size: 4 KiB
volume write buffer: 512 MiB
DTL: sync
Transport: TCP
Fragment cache: read/write quota: 10GiB
No block cache

added following to the volumedriver config:
backend_interface_partial_read_retry_interval_msecs: 10
backend_interface_partial_read_timeout_msecs: 250

fio test:

[global]
thread
ioengine=openvstorage
direct=1
bs=4k
size=2G
rw=randread
#rwmixread=75
group_reporting=1
time_based=1
randrepeat=1
username=edgeuser01
password=wo8baiKohcuaK0Ma

[test]
iodepth=1
volumename=test_jeroen
protocol=tcp
hostname=10.100.189.32
port=26203

without slowness:

fio: time_based requires a runtime/timeout setting
test: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=openvstorage, iodepth=1
fio-2.19-61-gc789-dirty
Starting 1 thread
Jobs: 1 (f=1): [r(1)][99.8%][r=5056KiB/s,w=0KiB/s][r=1264,w=0 IOPS][eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=127841: Mon Nov 27 14:50:40 2017
   read: IOPS=1132, BW=4530KiB/s (4639kB/s)(2048MiB/462906msec)
    slat (usec): min=2, max=124, avg= 9.86, stdev= 5.69
    clat (usec): min=260, max=16846, avg=870.22, stdev=485.57
     lat (usec): min=266, max=16851, avg=880.08, stdev=485.91
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  446], 10.00th=[  462], 20.00th=[  466],
     | 30.00th=[  466], 40.00th=[  482], 50.00th=[  620], 60.00th=[  932],
     | 70.00th=[ 1240], 80.00th=[ 1384], 90.00th=[ 1544], 95.00th=[ 1704],
     | 99.00th=[ 2008], 99.50th=[ 2064], 99.90th=[ 2864], 99.95th=[ 4048],
     | 99.99th=[ 6944]
    lat (usec) : 500=41.05%, 750=14.67%, 1000=5.30%
    lat (msec) : 2=37.81%, 4=1.13%, 10=0.05%, 20=0.01%
  cpu          : usr=1.99%, sys=2.17%, ctx=2934701, majf=0, minf=35
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=524288,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=4530KiB/s (4639kB/s), 4530KiB/s-4530KiB/s (4639kB/s-4639kB/s), io=2048MiB (2147MB), run=462906-462906msec

with slowness set to 0.5,0.5

test: (groupid=0, jobs=1): err= 0: pid=129924: Mon Nov 27 15:29:34 2017
   read: IOPS=46, BW=187KiB/s (191kB/s)(383MiB/2094742msec)
    slat (usec): min=3, max=93, avg=12.61, stdev= 7.41
    clat (usec): min=220, max=1984.6k, avg=21375.62, stdev=112392.48
     lat (usec): min=268, max=1984.7k, avg=21388.24, stdev=112392.71
    clat percentiles (usec):
     |  1.00th=[  310],  5.00th=[  310], 10.00th=[  330], 20.00th=[  466],
     | 30.00th=[  466], 40.00th=[  474], 50.00th=[  620], 60.00th=[ 3088],
     | 70.00th=[ 8896], 80.00th=[12480], 90.00th=[16064], 95.00th=[18304],
     | 99.00th=[774144], 99.50th=[897024], 99.90th=[987136], 99.95th=[995328],
     | 99.99th=[1036288]
    lat (usec) : 250=0.01%, 500=47.99%, 750=5.07%, 1000=0.47%
    lat (msec) : 2=0.24%, 4=8.67%, 10=9.75%, 20=25.13%, 50=0.49%
    lat (msec) : 100=0.03%, 250=0.01%, 750=1.07%, 1000=1.05%, 2000=0.04%
  cpu          : usr=0.54%, sys=1.88%, ctx=13533366, majf=0, minf=35
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=97921,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=187KiB/s (191kB/s), 187KiB/s-187KiB/s (191kB/s-191kB/s), io=383MiB (401MB), run=2094742-2094742msec
domsj commented

I think this is exactly the expected performance:

fragments on the hdds are 2MB
this means that optimistically we can fetch 50 fragments/s => 20ms latency

according to fio the avg latency=21388.24 us, which is only slightly more than 20ms

fwiw, in this case performance would be better if no fragment cache were configured ... then the proxy would do partial reads from the hdds too, while now it will always fetch full fragments.

@jeroenmaelbrancke isn't the partial read from HDD exactly the path you wanted to test by setting the asd-slowlyness?