pinterest/pymemcache

Peformance issue with get_many

matejsp opened this issue · 2 comments

When we replaced python-memcached with pymemcache in production we noticed increased latencies on our endpoints with several memcached servers (4).

After drilling down there major performance difference is between one and other implementation of get_many:

python-memcached get_many:
https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1195
request to first server, send request to second server, send request to third server, ...
wait for response from first server, second server ... etc

pymemcache:
https://github.com/pinterest/pymemcache/blob/master/pymemcache/client/base.py#L1182 and https://github.com/pinterest/pymemcache/blob/master/pymemcache/client/hash.py#L400
send request to first server, wait for response,
send request to second server, wait for response, ...
...

Do you have any idea how to optimise this?

jogo commented

Great find @matejsp, I suspect the way to optimize this is to refactor the hashing client to use a similar pattern to python-memcached's model.

We haven't hit this issue ourselves as use mcrouter.

Since I referenced this issue in Django, I would like to share some additional benchmarks that I made using pymemcache and python-memcached inside Django.

We are using memcached (ElastiCache) that has a round trip from each server on average 1 ms.

In our case we use 4 memcached server and when we call get_many based on hash algorithm it splits the load between all 4 servers and calls on each server get_many. Each call takes 1 ms so in total we observed total time of 4 ms (can also be 1ms, 2ms, 3ms, 4ms depending on hash key distribution and how many servers you hit with get_many).

In python-memcached when calling 4 servers in a loop the logic is optimized in a way that it first it sends to all 4 server and only then in receives the response from each of them taking 1 ms (basically waits for the slowest server) in total to complete get_many call.

I checked the code and from what I see _fetch_cmd/_store_cmd should be splited into sending and receiving logic and somehow brought up to hash client taking connection pooling into account. You would need to get 4 servers from pool, call send on each and then wait for receive for all servers.