parsa-epfl/cloudsuite

data-caching: cache not warmed up?

dev-zero opened this issue · 1 comments

I am running this from the data-caching benchmark:

ssh cn02 '"podman" exec -it dc-client /bin/bash /entrypoint.sh --m='\''S&W'\'' --S=28 --D=10240 --w=8 --T=1'
scale and warmup
stats_time = 1
Configuration:

nProcessors on system: 80
nWorkers: 8
runtime: -1
Get fraction: 0.900000
Naggle's algorithm: False
...
ssh cn02 '"podman" exec -it dc-client timeout 20 /bin/bash /entrypoint.sh --m=TH --S=28 --g=0.8 --c=200 --w=8 --T=1'
max throughput
stats_time = 1
Configuration:

nProcessors on system: 80
nWorkers: 8
runtime: -1
Get fraction: 0.800000
Naggle's algorithm: False


host: cn02.can.pt.horizon-opencube.eu
address: 127.0.1.1
Loading key value file...created uniform distribution 1000
rps -1 cpus 8
num_worker_connections 25
num_worker_connections 25
Creating worker on tid 2796704032
starting receive base loop
num_worker_connections 25
Creating worker on tid 2788249888
starting receive base loop
num_worker_connections 25
Creating worker on tid 2779795744
starting receive base loop
num_worker_connections 25
Creating worker on tid 2771341600
starting receive base loop
num_worker_connections 25
Creating worker on tid 2762887456
starting receive base loop
num_worker_connections 25
Creating worker on tid 2415915296
starting receive base loop
num_worker_connections 25
Creating worker on tid 2407461152
starting receive base loop
Created 200 connections total
Creating worker on tid 2399007008
starting receive base loop
Stats:
-------------------------
   unix_ts,  timeDiff,     rps,        requests,     gets,       sets,      hits,       misses,   avg_lat,      90th,      95th,        99th,       std,       min,        max,    avgGetSize
1701204605, 1701206306.204605,       0.0,     1258312,    1006803,     251509,    1006803,          0, 122.062654, 220.900000, 221.200000, 222.300000,  82.797673,   0.000000, 222.820997, 1081.453474
Outstanding requests per worker:
381 394 9146 9000 8973 18597 18320 18970
   unix_ts,  timeDiff,     rps,        requests,     gets,       sets,      hits,       misses,   avg_lat,      90th,      95th,        99th,       std,       min,        max,    avgGetSize
1701204606,   1.000001,  671913.3,      671914,     537918,     133996,     537918,          0, 124.707868, 221.000000, 221.300000, 221.600000,  84.307503,   4.373000, 221.882999, 1081.781095
Outstanding requests per worker:
419 421 9146 9000 8973 18597 18320 18970
...
ssh cn02 '"podman" exec -it dc-client timeout 20 /bin/bash /entrypoint.sh --m=TH --S=28 --g=0.8 --c=200 --w=8 --T=1'
max throughput
stats_time = 1
Configuration:

nProcessors on system: 80
nWorkers: 8
runtime: -1
Get fraction: 0.800000
Naggle's algorithm: False


host: cn02.can.pt.horizon-opencube.eu
address: 127.0.1.1
Loading key value file...created uniform distribution 1000
rps -1 cpus 8
num_worker_connections 25
num_worker_connections 25
Creating worker on tid 2040156448
starting receive base loop
num_worker_connections 25
Creating worker on tid 2031702304
starting receive base loop
num_worker_connections 25
Creating worker on tid 2023248160
starting receive base loop
num_worker_connections 25
Creating worker on tid 1811935520
starting receive base loop
num_worker_connections 25
Creating worker on tid 1803481376
starting receive base loop
num_worker_connections 25
Creating worker on tid 1795027232
starting receive base loop
num_worker_connections 25
Creating worker on tid 1786573088
starting receive base loop
Created 200 connections total
Creating worker on tid 1778118944
starting receive base loop
Stats:
-------------------------
   unix_ts,  timeDiff,     rps,        requests,     gets,       sets,      hits,       misses,   avg_lat,      90th,      95th,        99th,       std,       min,        max,    avgGetSize
1701204626, 1701206327.204626,       0.0,     1325735,    1061150,     264585,    1061150,          0,   2.695943,   3.000000,   3.000000,   3.100000,   0.391165,   0.000000,  10.986000, 1081.772463
Outstanding requests per worker:
235 237 241 242 240 241 244 246
   unix_ts,  timeDiff,     rps,        requests,     gets,       sets,      hits,       misses,   avg_lat,      90th,      95th,        99th,       std,       min,        max,    avgGetSize
1701204627,   1.000001,  660475.3,      660476,     528533,     131943,     528533,          0,   3.186954,   3.400000,   3.400000,   3.500000,   0.461673,   2.628000,  10.708000, 1081.651123
Outstanding requests per worker:
263 274 268 276 273 273 271 271
   unix_ts,  timeDiff,     rps,        requests,     gets,       sets,      hits,       misses,   avg_lat,      90th,      95th,        99th,       std,       min,        max,    avgGetSize
1701204628,   1.000001,  667635.3,      667636,     534458,     133178,     534458,          0,   3.435901,   3.600000,   3.700000,   3.700000,   0.122620,   2.946000,   4.048000, 1082.166988
Outstanding requests per worker:
295 287 277 299 285 288 281 299
...

Essentially the second time I am running the throughput-maximising client benchmark I get a significantly lower tail latency with roughly the same RPS. I would interpret this as the cache has in fact not been warmed up, correct?
Letting the second throughput benchmark run longer results in the 99th percentile to climb slowly, but remaining below 14usec.