Instrumental/instrumental_agent-ruby

Sidekiq: "Exception occurred: can't alloc thread"

Opened this issue · 1 comments

Hi,

An error occurs every now and then while the Sidekiq is running. What could be the reason? I use the Digital Ocean and:

Ubuntu 18.04.1 LTS

rails - 5.1.7
redis - 4.1.0
sidekiq - 5.2.7

sidekiq: 3 queues, 3 processes

redis_version: 4.0.9

Digital Ocean: CPU Optimized Droplets 8 GB, 4 vCPUs, 50 GB

From sidekiq.log:

E, [2019-07-20T22:53:30.849239 #9392] ERROR -- : Exception occurred: can't alloc thread
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/instrumental_agent-2.1.0/lib/instrumental/agent.rb:392:in `new'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/instrumental_agent-2.1.0/lib/instrumental/agent.rb:392:in `start_connection_worker'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/instrumental_agent-2.1.0/lib/instrumental/agent.rb:298:in `send_command'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/instrumental_agent-2.1.0/lib/instrumental/agent.rb:97:in `gauge'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/metrician-0.1.0/lib/metrician.rb:75:in `gauge'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/metrician-0.1.0/lib/metrician/reporters/redis.rb:20:in `ensure in call_with_metrician_time'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/metrician-0.1.0/lib/metrician/reporters/redis.rb:23:in `call_with_metrician_time'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:212:in `block in call_with_timeout'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:285:in `with_socket_timeout'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis/client.rb:211:in `call_with_timeout'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:1179:in `block in _bpop'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `block in synchronize'
/home/martio/.rvm/rubies/ruby-2.6.3/lib/ruby/2.6.0/monitor.rb:230:in `mon_synchronize'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:50:in `synchronize'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:1176:in `_bpop'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/redis-4.1.0/lib/redis.rb:1221:in `brpop'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/fetch.rb:36:in `block in retrieve_work'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq.rb:97:in `block in redis'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:65:in `block (2 levels) in with'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `handle_interrupt'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:64:in `block in with'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `handle_interrupt'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/connection_pool-2.2.2/lib/connection_pool.rb:61:in `with'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq.rb:94:in `redis'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/fetch.rb:36:in `retrieve_work'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:89:in `get_one'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:99:in `fetch'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:82:in `process_one'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:71:in `run'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/util.rb:16:in `watchdog'
/home/xxx/apps/xxx/shared/bundle/ruby/2.6.0/gems/sidekiq-5.2.7/lib/sidekiq/util.rb:25:in `block in safe_thread'

Is this sidekiq pro, by chance?

I found a bit here: sidekiq/sidekiq#4029

The baseline issue is something that is being relied on to do connection management is probably closing a connection earlier than the agent expects, but there's not enough information in that log for me to say what.