socketry/async-container

Controller breaks on SIGHUP when ready isn't explicitly called

bryanp opened this issue · 4 comments

Discussed this with @ioquatix on Slack. Reproduce with the example code from the readme:

require 'async/container'

class Controller < Async::Container::Controller
  def setup(container)
    container.async do |task|
      while true
        puts "hello"
        task.sleep(1)
      end
    end
  end
end

controller = Controller.new

controller.run

Send SIGHUP to the main process. You should see something like this in terminal:

zsh: hangup     bundle exec ruby readme.rb

"hello" is still printed because the child process is still running. Hitting Ctrl-C does not stop the process at this point. Stopping the process through Activity Monitor causes this error:

13.52s    error: Async::Container::Process [pid=5891] [2020-04-14 09:22:11 -0700]
               |   Async::Container::Terminate: SIGTERM
               |   → /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/process.rb:74 in `block (3 levels) in fork'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-1.24.2/lib/async/reactor.rb:204 in `select'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-1.24.2/lib/async/reactor.rb:204 in `run_once'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-1.24.2/lib/async/reactor.rb:234 in `run'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-1.24.2/lib/async/reactor.rb:56 in `run'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/generic.rb:168 in `block in async'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/process.rb:77 in `block (2 levels) in fork'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/process.rb:72 in `fork'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/process.rb:72 in `block in fork'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/process.rb:106 in `initialize'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/process.rb:71 in `new'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/process.rb:71 in `fork'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/forked.rb:35 in `start'
               |     /Users/bryanp/.gem/ruby/2.6.5/gems/async-container-0.16.4/lib/async/container/generic.rb:134 in `block in spawn'

Here's a working version:

require 'async/container'

class Controller < Async::Container::Controller
  def setup(container)
    container.run(count: 1) do |instance|
      Async do |task|
        task.async do
          loop do
            puts "hello"

            task.sleep 1
          end
        end

        instance.ready!

        task.children.each(&:wait)
      end
    end
  end
end

controller = Controller.new

controller.run

Couple observations:

  1. We should consider a better way to handle the case where ready! is not called.
  2. I'm not entirely sure how ready! is to be called when using container.async.

So, this code has received updates to get it working I production but there are some rough edges as you have outlined. Thanks for that.

Readiness handling

ready! is going to be an essential part of the contract between parent and child process. Right now I don't have any semantics for handling failures except "either the entire thing has failed" or "it failed, try to restart it".

We need to introduce timeout handling and backoff if failures are happening repeatedly, and this state needs to be modelled via notifications upstream.

An example of this is a child which fails because of a syntax error, no amount of restarting will make it work. So we need to implement a policy for restarting: try 3 times, if it fails every time, put it on ice and inform the parent. Provide a mechanism to try failed children (e.g. SIGHUP).

Interface considerations

The container.async method can either be deprecated/removed or augmented to supply the right interfaces. It's essentially equivalent to running your own Async block in a child.

The logic was to provide something like container.async do ... async code ... end. So you don't worry how then container is working but just tell it to execute some code asynchronously. However it might give a false sense of security to user (e.g. shared mutable resources).

I'm interested in contributing timeout handling and backoff. Since you have an approach in mind, is this something you'd be able to hand off? Realistically, it would involve some knowledge transfer. But I'm actively working on some process-related things, so I do have some context.

What about changing Async::Container::Generic#async to this?

def async(**options)
  spawn(**options) do |instance|
    Async::Reactor.run(instance) do |task|
      yield instance, task
    end
  end
end

Yielding two arguments from async is a bit surprising, which might be an argument for removing the async method and letting the user do it themselves. Not sure where I land here.

I'm happy for you to take a stab at the backoff handling. But, it's going to be quite tricky to get right. So happy to help.

You will need to check changes with falcon too.

To avoid breaking existing code:

def async(**options)
  spawn(**options) do |instance|
    Async::Reactor.run do |task|
      instance.ready!
      yield task
    end
  end
end

However I'd also be okay with removing that method.

By the way we should probably add a spec to capture the broken behaviour before fixing it to confirm it's fixed, these are tricky issues and having coverage is really helpful.