spotify/folsom

ReconnectingClient prints "Lost connection" for old nodes when another one is added - while requests go through just fine

alexeyOnGitHub opened this issue · 4 comments

I assume this is just a logging issue rather than a bug affecting client behavior.
I see this regularly -

say, I boot up 2 Memcached nodes: A and B. they are used by several app nodes (3 in my latest fault+perf test).
the total traffic sent to Memcached from these app nodes is 3000 RPS.
then I boot up one more Memcached node (C) and bring it into Load Balancer.
we use 3 seconds Dns refresh period in Memcached client setting, so the client recognizes that there are changes in the Memcached nodes list pretty quickly.

I see "Successfully connected" and "client connected: BinaryMemcacheClient(com.spotify.folsom.ketama.SrvKetamaClient" in logs, so that is all good.

but -
even though Memcached client keeps working fine (I can see requests in memcached tracking dashboards, no timeouts in the app, all good), I see these worrying messages in logs about 60 seconds after the new node is picked by the client. these messages refer to original Memcached nodes A and B, but not C.

INFO [2019-02-12 14:33:10,641] [folsom-default-scheduled-executor] [com.spotify.folsom.reconnect.ReconnectingClient] [] - Lost connection to A:11211
INFO [2019-02-12 14:33:10,641] [folsom-default-scheduled-executor] [com.spotify.folsom.reconnect.ReconnectingClient] [] - Lost connection to A:11211
INFO [2019-02-12 14:33:10,643] [folsom-default-scheduled-executor] [com.spotify.folsom.reconnect.ReconnectingClient] [] - Lost connection to B:11211
INFO [2019-02-12 14:33:10,643] [folsom-default-scheduled-executor] [com.spotify.folsom.reconnect.ReconnectingClient] [] - Lost connection to B:11211

we use 2 connections per host, this is why you see two "lost connection" lines per node.

in other words, ReconnectingClient prints "Lost connection" for old nodes after its nodes list was refreshed and another node was picked up.

I am guessing there is no problem with the current client connections and those messages are just some delayed cleanup messages from old connections. if that is the case, it would be nice to change the logging to be more explicit. "lost connection" looks scary

Thanks for the report! It sounds it's possible that the old clients are still trying to reconnect. I'll investigate it further.

Actually... looks like this is intentional. When reconnecting, there's a 60 second shutdown delay by default: https://github.com/spotify/folsom/blob/master/folsom/src/main/java/com/spotify/folsom/ketama/SrvKetamaClient.java#L206

You can reconfigure the shutdown delay if you want.

right, my point is specifically about the log message.
I was thinking maybe the log message could be changed to make it explicit that the connection was not suddenly lost, but it was due to the planned shutdown. e.g. check if (shutdown) ... and then print different messages depending on whether it is a sudden connection loss vs. planned shutdown.

the message is somewhat confusing and scary otherwise.

I'll look into how much work it would be to introduce pending shutdowns and silence the logs for those.