petergoldstein/dalli

Hanging when performing read_multi against large number of keys

Opened this issue · 4 comments

We're seeing an issue where the dalli client hangs trying to read ~100-200K keys at once. We were able to repro with the following code and a cache store configuration.

It looks similar to #776 but wanted to confirm and potentially add our use case given the age of that issue. We're going to mitigate on our end by dropping the batch size to 10K keys per call.

  • ruby 2.7.6p219
  • Rails 6.0.5.1
  • dalli (3.2.3)
  config.cache_store = :mem_cache_store,
    ENV.fetch('MEMCACHE_SERVERS', '127.0.0.1:11211'),
    { :namespace => "comp-#{Rails.env}" }
require 'timeout'

key_count = 2**10

while key_count < 10_000_000 do
    prefix = (rand * 2**64).to_i.to_s(32)
    puts "Prepping #{key_count} keys"
    keys = key_count.times.map { |i| "some_long_string:#{prefix}:#{i}" }
    t = Time.now
    puts "Starting set multi #{t}"
    Rails.cache.fetch_multi(*keys) { 0 }
    diff = Time.now - t
    puts "Completed in #{diff}s"
    t = Time.now
    puts "Starting multi get #{t}"
    Timeout::timeout(diff) {
        Rails.cache.read_multi(*keys) { 0 }
    }
    puts "Completed in #{Time.now - t}s"
    key_count *= 2
end
Prepping 65536 keys
Starting set multi 2022-12-09 17:36:08 +0000
Completed in 6.119117072s
Starting multi get 2022-12-09 17:36:14 +0000
Completed in 1.0013925s
Prepping 131072 keys
Starting set multi 2022-12-09 17:36:15 +0000
Completed in 12.960816923s
Starting multi get 2022-12-09 17:36:28 +0000
Traceback (most recent call last):
        41: from bin/rails:9:in `<main>'
        40: from bin/rails:9:in `require'
        39: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands.rb:18:in `<top (required)>'
        38: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/command.rb:46:in `invoke'
        37: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/command/base.rb:69:in `perform'
        36: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor.rb:392:in `dispatch'
        35: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor/invocation.rb:127:in `invoke_command'
        34: from /usr/local/lib/ruby/gems/2.7.0/gems/thor-1.2.1/lib/thor/command.rb:27:in `run'
        33: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands/runner/runner_command.rb:42:in `perform'
        32: from /usr/local/lib/ruby/gems/2.7.0/gems/railties-6.0.5.1/lib/rails/commands/runner/runner_command.rb:42:in `load'
        31: from railscache.rb:16:in `<top (required)>'
        30: from /usr/local/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
        29: from railscache.rb:17:in `block in <top (required)>'
        28: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:387:in `read_multi'
        27: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:686:in `instrument'
        26: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications.rb:180:in `instrument'
        25: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications/instrumenter.rb:24:in `instrument'
        24: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/notifications.rb:180:in `block in instrument'
        23: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:686:in `block in instrument'
        22: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache.rb:388:in `block in read_multi'
        21: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/strategy/local_cache.rb:131:in `read_multi_entries'
        20: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/mem_cache_store.rb:159:in `read_multi_entries'
        19: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:367:in `with'
        18: from /usr/local/lib/ruby/gems/2.7.0/gems/activesupport-6.0.5.1/lib/active_support/cache/mem_cache_store.rb:159:in `block in read_multi_entries'
        17: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `get_multi'
        16: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `tap'
        15: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:108:in `block in get_multi'
        14: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:19:in `process'
        13: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/ring.rb:78:in `lock'
        12: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:20:in `block in process'
        11: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:32:in `setup_requests'
        10: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `make_getkq_requests'
         9: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `each'
         8: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:48:in `block in make_getkq_requests'
         7: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `request'
         6: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `synchronize'
         5: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:18:in `block in request'
         4: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:35:in `request'
         3: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:210:in `pipelined_get'
         2: from /usr/local/lib/ruby/2.7.0/forwardable.rb:235:in `write'
         1: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write'
/usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write': execution expired (Timeout::Error)

Also able to repro without rails via the following:

require 'timeout'
require 'dalli'

key_count = 2**10

dalli_client = Dalli::Client.new( ENV.fetch('MEMCACHE_SERVERS'), timeout: 0.5, error_when_over_max_size: true)
while key_count < 10_000_000 do
    prefix = (rand * 2**64).to_i.to_s(32)
    puts "Prepping #{key_count} keys"
    keys = key_count.times.map { |i| "some_long_string:#{prefix}:#{i}" }
    t = Time.now
    puts "Starting set multi #{t}"
    keys.each do |k|
        dalli_client.set(k, 0)
    end
    diff = Time.now - t
    puts "Completed in #{diff}s"
    t = Time.now
    puts "Starting multi get #{t}"
    Timeout::timeout(diff) {
        dalli_client.get_multi(keys)
    }
    puts "Completed in #{Time.now - t}s"
    key_count *= 2
end
Prepping 524288 keys
Starting set multi 2022-12-09 17:25:48 +0000
Completed in 40.370745713s
Starting multi get 2022-12-09 17:26:28 +0000
Completed in 1.500120584s
Prepping 1048576 keys
Starting set multi 2022-12-09 17:26:30 +0000
Completed in 81.036222996s
Starting multi get 2022-12-09 17:27:51 +0000
Traceback (most recent call last):
        20: from dalli.rb:20:in `<main>'
        19: from /usr/local/lib/ruby/2.7.0/timeout.rb:110:in `timeout'
        18: from dalli.rb:21:in `block in <main>'
        17: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `get_multi'
        16: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:107:in `tap'
        15: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/client.rb:108:in `block in get_multi'
        14: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:19:in `process'
        13: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/ring.rb:78:in `lock'
        12: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:20:in `block in process'
        11: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:32:in `setup_requests'
        10: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `make_getkq_requests'
         9: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:47:in `each'
         8: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/pipelined_getter.rb:48:in `block in make_getkq_requests'
         7: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `request'
         6: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:17:in `synchronize'
         5: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/options.rb:18:in `block in request'
         4: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:35:in `request'
         3: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/base.rb:210:in `pipelined_get'
         2: from /usr/local/lib/ruby/2.7.0/forwardable.rb:235:in `write'
         1: from /usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write'
/usr/local/lib/ruby/gems/2.7.0/gems/dalli-3.2.3/lib/dalli/protocol/connection_manager.rb:157:in `write': execution expired (Timeout::Error)

@pcorliss-provi I strongly suspect it's the same issue as #776 . I don't have a mitigation for that yet, other than limiting the number of keys fetched at a time using multi_get. It's possible to implement that batching internal to the client (i.e. break bigger batches into batches of ~10k, although it's slightly more complicated because of the ring), but I haven't done that. If you'd like to submit a PR it would be welcome.

@petergoldstein Funny that I seem to have discovered on the same day as @pcorliss-provi that we are affected by this / #776 as well. Increasing sndbuf and rcvbuf is a good workaround for now, but I would like to fix this properly.

I have started implementing interleaved reading/writing (which is what I suspect you mean by "batching internal to the client"), but it needs some polishing. If you want I can push a rough solution so we can have a discussion. I am especially not quite sure what you mean by "it's slightly more complicated because of the ring".

For now, do you want to close this as duplicate in favor of #776?

@marvinthepa Yes, interleaved reading/writing is basically what I was thinking. The ring (the data structure that distributes among the memcached servers) makes it more complicated since the buffer problem is per-memcached server. So ideally the interleaving would occur per server, but that requires breaking the ring abstraction. So it may be simpler to ignore this detail and batch as if it's a single server.

And yes, we can close this issue as a duplicate.