questions about join/wait shutdown in master

Question

questions about join/wait shutdown in master

jjb opened this issue 4 years ago · 7 comments

why are both :wait and :join accepted? I looked in the code history and it doesn't seem to be for supporting previous behavior (but maybe I missed it) https://github.com/jmettraux/rufus-scheduler/blob/master/lib/rufus/scheduler.rb#L546-L563
looks like it accepts a limit but this isn't documented? i can make a PR for the readme if so
from my reading, the limit doesn't actually limit the worker threads, only the master thread. what's the utiluty of this? after worker threads are done, is there anything left for master thread to do? i guess if it's working on a non-threaded job?
if you do want to limit the worker threads you can take an approach like this (untested, just for discussion purposes)

def join_shutdown(opts)

  w = opts[:wait] || opts[:join]

  @paused_at = EoTime.now


  puts "telling all threads to shut down in #{limit} seconds or less"
  (work_threads.size * 2 + 1).times { @work_queue << :shutdown }

  shutdown_threads = []
  work_threads.each do |t|
    next if t == Thread.current
    shutdown_threads << Thread.new do
      Thread.new do
        sleep w
        puts "#{t} did not shut down in time, killing it"
        t.kill
      end
      t.join
    end
  end

  puts "waiting for threads to shut down"
  shutdown_threads.each{|t| t.join}

  @started_at = nil

  join(w.is_a?(Numeric) ? w : nil)
end

Answer 1 · 2020-12-26T02:49:02.000Z

Hello,

why are both :wait and :join accepted? I looked in the code history and it doesn't seem to be for supporting previous behavior (but maybe I missed it) https://github.com/jmettraux/rufus-scheduler/blob/master/lib/rufus/scheduler.rb#L546-L563

I can't remember.

looks like it accepts a limit but this isn't documented? i can make a PR for the readme if so

Indeed, I'd problably write that myself. it'd be faster.

from my reading, the limit doesn't actually limit the worker threads, only the master thread. what's the utiluty of this? after worker threads are done, is there anything left for master thread to do? i guess if it's working on a non-threaded job?

if you do want to limit the worker threads you can take an approach like this (untested, just for discussion purposes)

It got (more) complicated because of #304

Is there actually a problem you are experiencing?

Answer 2 · 2020-12-26T17:35:50.000Z

I am using rufus to make a "clock" process which is a cron replacement

I want to include something like this

Signal.trap("TERM") {
  puts "received TERM"
  puts scheduler.shutdown(wait: 29) # PaaS sends SIGKILL 30 seconds after sending SIGTERM
  exit
}

So, it would be nice if I could control my inner threads getting killed, with some logging, instead of just having them and the surrounding process be hard killed by my PaaS (although in most or maybe all cases, there might not be any actual semantic consequence, so scheduler.shutdown(:join) is probably fine)

Answer 3 · 2020-12-26T18:12:33.000Z

If I remember correctly, and seeing #304

scheduler.shutdown(wait: 29)

does already work.

I will double-check and document it.

Answer 4 · 2020-12-26T19:13:38.000Z

Here's how I am testing. with rufus-scheduler at master

clock.rb

puts "starting clock.rb: #{Time.now}"

puts $PROCESS_ID

require 'rufus-scheduler'

scheduler = Rufus::Scheduler.new

Signal.trap("TERM") {
  puts "received TERM"
  puts scheduler.shutdown(wait: 5)
  exit
}

scheduler.every '1 second', times: 1 do
  puts "job starting"
  20.times{|i| print '.'; sleep 1 }
  puts "job ending"
end

scheduler.join

terminal 1

ruby clock.rb

note PID

terminal 2

kill <PID from terminal 1 output>

this sends TERM

expected behavior: scheduler ends 5 seconds after TERM is sent

observed behavior: job only ends after all 20 iterations are complete

Answer 5 · 2020-12-26T21:15:14.000Z

@jjb wrote:

expected behavior: scheduler ends 5 seconds after TERM is sent
observed behavior: job only ends after all 20 iterations are complete

Next time, please state directly what you expect and what is observed at the top of the issue.

Thanks in advance.

Answer 6 · 2020-12-29T00:31:24.000Z

Thanks a lot for pointing at this issue!

Answer 7 · 2020-12-29T15:31:29.000Z

Awesome, thank you! I made this PR with a test improvement for the new code #315