zk-ruby/zk

Seemingly unconfigurable receive timeout

wwboynton opened this issue · 4 comments

Hello!

I've been experiencing a problem with timeouts in the zk gem. When I make a .children call on a very large zk system, I time out after exactly ten seconds.

I'm aware (now, anyway) that the connection timeout established in ZK::Client::Threaded is not the same timeout I'm looking for. I also noted this bit on line 63 of threaded.rb:

  # @note The `:timeout` argument here is *not* the session_timeout for the
  #   connection. rather it is the amount of time we wait for the connection
  #   to be established. The session timeout exchanged with the server is 
  #   set to 10s by default in the C implemenation, and as of version 0.8.0 
  #   of slyphon-zookeeper has yet to be exposed as an option. That feature
  #   is planned. 

However, I'd like to assume that the session timeout is not what I'm looking for either.

Is the timeout I need to prevent the following trace exposed anywhere?

#<ZK::Exceptions::OperationTimeOut: inputs: {:path=>"/path/to/some/node"}>
inputs: {:path=>"/path/to/some/node"}
/Users/wwboynton/.rvm/gems/ruby-1.9.3-p392/gems/zk-1.8.0/lib/zk/client/base.rb:1068:in `check_rc'
/Users/wwboynton/.rvm/gems/ruby-1.9.3-p392/gems/zk-1.8.0/lib/zk/client/base.rb:1057:in `call_and_check_rc'
/Users/wwboynton/.rvm/gems/ruby-1.9.3-p392/gems/zk-1.8.0/lib/zk/client/threaded.rb:581:in `call_and_check_rc'
/Users/wwboynton/.rvm/gems/ruby-1.9.3-p392/gems/zk-1.8.0/lib/zk/client/base.rb:717:in `block in children'
/Users/wwboynton/.rvm/gems/ruby-1.9.3-p392/gems/zk-1.8.0/lib/zk/event_handler.rb:282:in `setup_watcher!'
/Users/wwboynton/.rvm/gems/ruby-1.9.3-p392/gems/zk-1.8.0/lib/zk/client/base.rb:1100:in `setup_watcher!'
/Users/wwboynton/.rvm/gems/ruby-1.9.3-p392/gems/zk-1.8.0/lib/zk/client/base.rb:716:in `children'
zk.children call results in ^

Thank you!

So, fair warning, it's been a long time since I've really dealt with this code. You may be running into the zookeeper gem's DEFAULT_RECEIVE_TIMEOUT_MSEC which is 10s. This was based on the default cluster timeout of 20s (at the time of writing).

You should try creating your ZK instance like:

zk = ZK.new(:receive_timeout_msec => 30_000) # 30s

and see if that helps

Thanks for your suggestion! I'll absolutely give that a shot when I get into work and I'll report back on it.

Just fyi, I tried the receive_timeout_msec setting for this guide with the latest version (1.9.6) and it didn't seem to have an impact (got Zookeeper::Exceptions::ContinuationTimeoutError after 30 seconds)

My recollection was that this trick worked at the time. That said, this library hasn't been given a significant update in years, and doesn't even really support ruby 2.0. The potential for bitrot is tremendous, and you should probably find another library.