mumoshu/play2-memcached

Connection Failures thrown when using clustered memcached server

johnwrf opened this issue · 7 comments

Hello,

We have setup a memcached cluster, eg: a Load Balancer with two memcached instances. When we configure application.conf / memached.host to point to the Load Balancer, we are getting connection errors. The errors occur at the moment when we try to do a cache.get. Here is the sample code:

application.conf =
ehcacheplugin=disabled
memcached.host="elb-url:11211"

NOTES

  1. Note that this is the url indicated for the 2nd error.
    load-memcached-elb-1610895556.us-east-1.elb.amazonaws.com/52.4.194.128:11211
  2. That one of the errors indicates there were problems trying to reconnect. Is this possibly caused because the ELB does not guarantee going back to the same node ? How to fix this ?
    a) use a different type of load balancer (other than ELB)
    b) do not configure memcached.host to memcached ELB,

Exception 1:
java.util.concurrent.ExecutionException: java.util.concurrent.CancellationException: Cancelled
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:170) ~[net.spy.spymemcached-2.9.0.jar:2.9.0]
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:62) ~[net.spy.spymemcached-2.9.0.jar:2.9.0]
at com.github.mumoshu.play2.memcached.MemcachedPlugin$$anon$2.get(MemcachedPlugin.scala:112) ~[com.github.mumoshu.play2-memcached_2.10-0.6.0.jar:0.6.0]
at play.api.cache.Cache$.get(Cache.scala:80) [com.typesafe.play.play-cache_2.10-2.3.8.jar:2.3.8]
at play.api.cache.Cache.get(Cache.scala) [com.typesafe.play.play-cache_2.10-2.3.8.jar:2.3.8]
at play.cache.Cache.get(Cache.java:19) [com.typesafe.play.play-cache_2.10-2.3.8.jar:2.3.8]
at com.healthfleet.util.CacheUtil.cacheGet(CacheUtil.java:71) [prometheus.prometheus-1.3128-SNAPSHOT.jar:1.3128-SNAPSHOT]
........

Exception 2:
net.spy.memcached.internal.CheckedOperationTimeoutException: Timed out waiting for operation - failing node: load-memcached-elb-1610895556.us-east-1.elb.amazonaws.com/52.4.194.128:11211
at net.spy.memcached.internal.OperationFuture.get(OperationFuture.java:160) ~[net.spy.spymemcached-2.9.0.jar:2.9.0]
at net.spy.memcached.internal.GetFuture.get(GetFuture.java:62) ~[net.spy.spymemcached-2.9.0.jar:2.9.0]
at com.github.mumoshu.play2.memcached.MemcachedPlugin$$anon$2.get(MemcachedPlugin.scala:112) ~[com.github.mumoshu.play2-memcached_2.10-0.6.0.jar:0.6.0]
at play.api.cache.Cache$.get(Cache.scala:80) [com.typesafe.play.play-cache_2.10-2.3.8.jar:2.3.8]
at play.api.cache.Cache.get(Cache.scala) [com.typesafe.play.play-cache_2.10-2.3.8.jar:2.3.8]
at play.cache.Cache.get(Cache.java:19) [com.typesafe.play.play-cache_2.10-2.3.8.jar:2.3.8]
at com.healthfleet.util.CacheUtil.cacheGet(CacheUtil.java:71) [prometheus.prometheus-1.3128-SNAPSHOT.jar:1.3128-SNAPSHOT]
.......

these are alternate settings allowing us to point to the instanced behind the load balancer directly.
memcached.1.host="instance1-IPAddress:11211"
memcached.2.host="instance2-IPAddress:11211"

@johnwrf

(Hi, thanks for using this plugin!)

You really should use the alternate setting you've mentioned:

memcached.1.host="instance1-IPAddress:11211"
memcached.2.host="instance2-IPAddress:11211"

Using ELB like this really doesn't make sense. Remove it!!!!

memcached.host="elb-url:11211"

Putting memcached hosts behind a load balancer doesn't make sense because you can't
stick to the same memcached host via ELB.
Once tried, you'll write to the host A and then read from the host B, ending up a cache miss.

If you are trying to use ELB to distribute reads/writes to Memcached hosts, just use the alternate setting instead.
By using it, reads/writes are distributed to hosts according to keys.
See spymemcached's code to understand how it works under the hood.

hello mumoshu,
Thanks so much for your response. When we finish, I will link / copy what you posted here to another thread I placed in google groups, so everyone can benefit.
https://groups.google.com/forum/#!topic/play-framework/jiFMOq_CB50

additional questions..

  1. can you please provide more information on the alternate setting. I did not see anything in the docs for that. Also, I am not sure where to look in spymemcached's code. -- thanks for the link :)

  2. With the memcached.1.host / memcached.2.host type of configuration, it implies the app itself needs to know how many memcached hosts are in the cluster. Can you provide recommendations on how to address these concerns:

  • what if a memcached host goes down.. Should a dns name be used so that the app is not hardwired to an API address ? Then we can implement a failover for each memcached host... and in the meantime, all the related caching operations would fail ? Or is there redundancy across hosts that would avoid caching failures.
  • what if a memcached host is added or removed from the cluster. This configuration implies that application.conf files for all running app servers need to be updated accordingly, and those apps need to be restarted. Is there a better way than this ?
  1. Finally, how does your plugin normally decide which memcached host to cache data to, without the alternate setting ?

Thanks again !

@mumoshu
I came across this article on consistent hashing and found it to be very relevant to your memcached plugin implementation. Have you implemented consistent hashing for selecting caching host in a multiple host configuration ?

https://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html

@mumoshu
We switched to use the alternate setting, where each memcached host is registered in application.conf

How does the memcached plugin decide which memcaced host to send a request to ?

@johnwrf Selecting the host for a request is done in the spymemcached library which this plugin uses internally to communicate with Memcached. Consistent hashing is implemented in spymemcached but not used in this plugin. It won't be difficult to make this plugin be able to switch between different hashing algorithms, though.

Unfortunately, I have never seen the documentation for the spymemcached's host selection logic so I recommend you to read the code.

For you, here's the summary for relevant parts of the code:

@mumoshu
Thank you for providing a detailed description. If we decide to implement the consistent hashing, I will surely share back with you so you can make it part of the plugin.
I would like to make sure I understand how the current implementation selects the target memcached host for a key. Is it something like this:
server = getServerForKey( key )

The above implementation will always map the a key to the same memcached server, as long as the number of servers does not change. So there is no risk of writing to one server and subsequently reading from a different server. The only issue I see is that it is not dynamic.

Thanks,

John