key pool can cause memtier to hang when there are multiple shards and request count is bellow pool size
filipecosta90 opened this issue · 0 comments
filipecosta90 commented
this can be easily reproduced with master ( or any previous stable version ) and more than 1 shards scenario on OSS cluster.
with a simple > 1 shard scenario:
$ redis-cli
127.0.0.1:6379> cluster slots
1) 1) (integer) 0
2) (integer) 5461
3) 1) "127.0.0.1"
2) (integer) 6379
3) "ae7607ffdb3519473d83a9420f3a00c162820a8a"
4) (empty array)
2) 1) (integer) 5462
2) (integer) 10923
3) 1) "127.0.0.1"
2) (integer) 6381
3) "ffa17c93a302f721c18377c7256053ecb8175e97"
4) (empty array)
3) 1) (integer) 10924
2) (integer) 16383
3) 1) "127.0.0.1"
2) (integer) 6383
3) "eafe2ee085ca92723662dd18661088f70962c197"
4) (empty array)
you can see that there is no assurance that the 4 requests generated are for the connections that is sending the requests ( given we have 3 connections ).
$ memtier_benchmark --threads 1 --clients 1 -n 4 --ratio 0:1 --cluster-mode --hide-histogram -D
Writing results to stdout
[RUN #1] Preparing benchmark client...
client.cpp:111: new client 0x55a181879400 successfully set up.
[RUN #1] Launching threads now...
shard_connection.cpp:372: sending cluster slots command.
shard_connection.cpp:425: cluster slot command successful
shard_connection.cpp:587: server 127.0.0.1:6379: GET key=[memtier-1168204]
shard_connection.cpp:587: server 127.0.0.1:6383: GET key=[memtier-6263819]
shard_connection.cpp:438: server 127.0.0.1:6379: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:587: server 127.0.0.1:6379: GET key=[memtier-3771236]
shard_connection.cpp:438: server 127.0.0.1:6383: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:587: server 127.0.0.1:6383: GET key=[memtier-5586315]
shard_connection.cpp:438: server 127.0.0.1:6379: handled response (first line): $-1, 0 hits, 1 misses
shard_connection.cpp:438: server 127.0.0.1:6383: handled response (first line): $-1, 0 hits, 1 misses
client.cpp:219: nothing else to do, test is finished.
[RUN #1 100%, 0 secs] 1 threads: 4 ops, 7782 (avg: 7782) ops/sec, 303.99KB/sec (avg: 303.99KB/sec), 0.20 (avg: 0.20) msec latency
this is due to the following codition
m_config->requests > 0 && m_reqs_processed >= m_config->requests
and https://github.com/RedisLabs/memtier_benchmark/blob/master/cluster_client.cpp#L352
// store key for other connection, if queue is not full
key_index_pool* key_idx_pool = m_key_index_pools[other_conn_id];
if (key_idx_pool->size() < KEY_INDEX_QUEUE_MAX_SIZE) {
key_idx_pool->push(*key_index);
m_reqs_generated++;
}
This can be solved by avoiding pushing to other pools if we don't have enough requests to fill all pools of all shards.