zombocom/puma_worker_killer

Be more explicit about where to put PumaWorkerKiller.start

ponny opened this issue · 22 comments

"Somewhere in your main process run this code:"

Could you provide an example for Rails apps? I'm not sure exactly where would be best. Should it go in production.rb? application.rb? My puma initialiser? The unicorn-worker-killer suggests config.ru.

For a rails app, i would recommend an initializer config/initializers/puma_worker_killer.rb care to make a doc PR for me to add that in?

That doesn't seem to have worked. It hit the limit and didn't die. I have four workers on a box with 8 gig. Do these settings look correct?

In config/initializers/puma_worker_killer.rb
PumaWorkerKiller.config do |config|
config.ram = 8192 # mb
config.frequency = 5 # seconds
config.percent_usage = 0.80
end

PumaWorkerKiller.start

mhat commented

Seems like PumaWorkerKiller doesn't work with daemonize. As soon as puma calls Process.daemonize PumaWorkerKiller's AutoReaping thread dies with the original process and nothing restarts it in the new forked/detched process.

@mhat can you give me a super simple app and some commands to reproduce so I can play around with this?

I'm having the same issue - puma worker killer is not actually running. I think @mhat has described the problem accurately.

@waziers can you give me a super simple app and some commands to reproduce so I can play around with this?

Can't provide an app (yet), but the core issue seems to be that PumaWorkerKiller runs in a process forked from puma master process, and can't access the values from parent, namely the @workers instance variable.

I sprinkled couple print statements in my app and got this:

"Puma::Cluster#spawn_workers: self.object_id = 70208318574700"
"Puma::Cluster#spawn_workers: @workers.object_id = 70208318574680"
"PumaWorkerKiller::PumaMemory#set_workers: @master.object_id = 70208318574700"
"PumaWorkerKiller::PumaMemory#set_workers: @master.instance_variable_get("@workers").object_id = 70208317817100"

@master.instance_variable_get("@workers") returns empty array no matter how many workers are there, so PumaWorkerKiller::PumaMemory#running? always returns false, and reaper never runs.

It seems that the @workers instance gets overwritten in the master process after worker processes have been forked (assuming bigger object_id equals later allocation), but I haven't been able to pinpoint exactly where :(

A bit more investigation - @master.instance_variable_get("@workers").object_id matches that of @workers variable of Puma::Cluster::Worker defined at https://github.com/puma/puma/blob/master/lib/puma/cluster.rb#L191

I'm out on vacation can you ping me again on Monday?


Sent from Mailbox

On Mon, Feb 9, 2015 at 9:19 AM, Toms Mikoss notifications@github.com
wrote:

@schneems @mhat @waziers I've created example app for this - https://github.com/tmikoss/puma-worker-killer-example

Reply to this email directly or view it on GitHub:
#4 (comment)

@schneems ping. Also, I thought I had a working solution, but that appears to be broken as well. Basically, what I've found:

  1. Can't run the worker killer in a worker process because of changes done here puma/puma@7c41d08
  2. Can't run the worker killer on the master process prior to daemonization because the thread gets killed off
  3. Should be able to run worker killer on the maste process post daemonization, but how?

I tried prepend-ing the initializer into the guts of puma master process (see https://github.com/tmikoss/puma-worker-killer-example/blob/pseudo-solution/bin/puma). But either due to how daemonization is handled, or how puma gem is structured, both the runner and cluster files get reloaded after daemonization and any changes get reverted.

Put this code in an initializer

PumaWorkerKiller.config do |config|
  config.ram = 192 # mb
  config.frequency = 10 # seconds
  config.percent_usage = 0.80
end

PumaWorkerKiller.start

I just tried this locally and it seems to work:

$ foreman start web
14:38:56 web.1    | [71963] PumaWorkerKiller: Consuming 111.17578125 mb with master and 2 workers
14:39:06 web.1    | [71963] PumaWorkerKiller: Consuming 111.1796875 mb with master and 2 workers
14:39:16 web.1    | [71963] PumaWorkerKiller: Consuming 111.19140625 mb with master and 2 workers
14:39:26 web.1    | [71963] PumaWorkerKiller: Consuming 111.19140625 mb with master and 2 workers
14:39:36 web.1    | [71963] PumaWorkerKiller: Consuming 111.203125 mb with master and 2 workers
14:39:42 web.1    | [71965] 127.0.0.1 - - [17/Feb/2015:14:39:42 -0600] "GET / HTTP/1.1" 200 - 0.3134
14:39:42 web.1    | [71966] 127.0.0.1 - - [17/Feb/2015:14:39:42 -0600] "GET / HTTP/1.1" 200 - 0.3228
14:39:42 web.1    | [71966] 127.0.0.1 - - [17/Feb/2015:14:39:42 -0600] "GET / HTTP/1.1" 200 - 0.1711
14:39:42 web.1    | [71965] 127.0.0.1 - - [17/Feb/2015:14:39:42 -0600] "GET / HTTP/1.1" 200 - 0.0414
14:39:46 web.1    | [71963] PumaWorkerKiller: Out of memory. 2 workers consuming total: 266.0078125 mb out of max: 153.60000000000002 mb. Sending TERM to #<Puma::Cluster::Worker:0x007fa226a13ec8 @index=1, @pid=71966, @phase=0, @stage=:booted, @signal="TERM", @options={:min_threads=>5, :max_threads=>5, :quiet=>false, :debug=>false, :binds=>["tcp://0.0.0.0:3000"], :workers=>2, :daemon=>false, :before_worker_shutdown=>[], :before_worker_boot=>[#<Proc:0x007fa222dff108@config/puma.rb:11>], :after_worker_boot=>[], :worker_directory=>"/Users/richardschneeman/documents/projects/codetriage", :config_file=>"config/puma.rb", :mode=>:http, :on_restart=>[], :worker_timeout=>60, :worker_shutdown_timeout=>30, :rackup=>"config.ru", :environment=>"development", :preload_app=>true, :control_auth_token=>"d8d0c44d72f0bd520eb937cbea6d8e", :tag=>"codetriage", :logger=>#<Puma::Events:0x007fa222e3def8 @formatter=#<Puma::Events::PidFormatter:0x007fa222dfec30>, @stdout=#<IO:<STDOUT>>, @stderr=#<IO:<STDERR>>, @debug=false, @on_booted=[], @hooks={}>}, @first_term_sent=nil, @last_checkin=2015-02-17 14:39:41 -0600> consuming 85.671875 mb.

Using puma 2.11.0 and PWK from github master on Ruby 2.2

Sorry, I missed the part about demonization. Can you give me some instructions on how you're booting Puma?

Add --daemon flag on command line, or daemonize directive in the puma config file.

For a runnable example, check out puma.rb in my reproduction repo - https://github.com/tmikoss/puma-worker-killer-example/blob/master/puma.rb

I looked at this and I know why it's happening now, but I don't know how to work around it.

My code needs access to the PIDs of the master process and all workers. It does this by traversing object space and finding the object that holds a reference to the master PID, this same object also has access to workers. From there I have everything I need.

Note i added preload_app! to your config.

What happens when you run this without daemonizing is that a thread gets spawned in the master process and it runs there with no problem.

What happens when you run with daemonizing is that a process gets created, your the thread gets created and it is running inside of process PID 38143, you should get something that looks like this:

$ be puma -C puma.rb
[38143] Puma starting in cluster mode...
[38143] * Version 2.11.0 (ruby 2.2.0-p0), codename: Intrepid Squirrel
[38143] * Min threads: 1, max threads: 1
[38143] * Environment: development
[38143] * Process workers: 2
[38143] * Preloading application
[38143] * Listening on tcp://0.0.0.0:3000
[38143] ! WARNING: Detected 1 Thread(s) started in app boot:
[37996] ! #<38143:0x007ff8848a9678@/Users/richardschneeman/.gem/ruby/2.2.0/gems/puma_worker_killer-0.0.3/lib/puma_worker_killer/auto_reap.rb:12 sleep> - /Users/richardschneeman/.gem/ruby/2.2.0/gems/puma_worker_killer-0.0.3/lib/puma_worker_killer/auto_reap.rb:15:in `sleep'
[38143] * Daemonizing...

I added some logging statements and turned on the PID recording so I can see that the master process starts in a PID 38147 which is different than the thread is running in. This is a result of calling Process.daemon(true) https://github.com/puma/puma/blob/41825c7cd1b149c2843d6127e37dbd07262a91f3/lib/puma/cluster.rb#L367. Now the process our thread is running in is dead, and even if it wasn't it wouldn't have access to the master process.

I tried adding the boot code to the on_worker_boot block, but as you guessed this doesn't ever get called for the master (I added puts to output the PID)

=== puma startup: 2015-03-03 12:49:13 -0600 ===
PID - 38148
PID - 38149
[38147] - Worker 0 (pid: 38148) booted, phase: 0
[38147] - Worker 1 (pid: 38149) booted, phase: 0

Problem

We need to execute PumaWorkerKiller.start on the master process. It doesn't look like we can execute this code on when running in daemonize mode as of now. We would either need to have puma add in a block that gets called explicitly in the master process before workers get spawned, or we could monkey patch a method like start_control but that's pretty bad.

My question at this point would be how does unicorn worker killer deal with the same problem?

Hi @schneems , add PumaWorkerKiller.start to befor_fork in puma.rb config file . It will be work.
like this:
before_fork do
ActiveRecord::Base.connection_pool.disconnect!
PumaWorkerKiller.config do |config|
config.ram = 2 * 1024 # mb
config.frequency = 10 # seconds
config.percent_usage = 0.96
config.rolling_restart_frequency = 12 * 3600 # 12 hours in seconds
end
PumaWorkerKiller.start
end

Thanks! Could you send me a doc PR?

Done, Please ref: #20

Hi @schneems , I just find another problem about puma when we add start script into before_fork block.
When we add start scripts into before fork, it will cause puma can't stop or restart, this is a real problem about puma. Please check this https://github.com/puma/puma/pull/846/commits and this puma/puma#830

the 846 commits is work. I have used it .

@schneems @robotJiang I'm also facing this issue with the before_fork block. I have to ssh in and delete that line from the state file before doing a deploy each time.

Are you doing rolling restarts? I think the issue is that once we have the "master" we never check for it again, so in the case of a rolling restart the object actually changes (or something weird like that).

I'm pretty sure the issue can be resolved by looking for invalid state and refreshing all objects: master, workers, etc.

I think this is cleared up, if not then please open a new issue.