mesos/mesos-go

executor shutdown callback couldn't complete properly

JiaminZhu opened this issue · 4 comments

driver.stop() in shutdown cause executor exit immediately without waiting for executor shutdown callback complete.
https://github.com/mesos/mesos-go/blob/master/api/v0/executor/executor.go#L484-L501

I'll put some context here:

Looking at the go driver code, looks like the driver does a stop() when it gets a shutdown message from the agent. This will likely cause the driver and the executor to exit immediately. This is different https://github.com/apache/mesos/blob/master/src/exec/exec.cpp#L428 from how the C++ driver responds to the shutdown message.

https://lists.apache.org/thread.html/e2d87b2e988a026cb497c3aabfc7b2c7d5632d8fdb5b09b503661f50@%3Cdev.mesos.apache.org%3E

Thanks @janisz

jdef commented

it sounds like you're looking for semantics more like this?

...
  t := time.NewTimer(shutdownGracePeriod)
  ch := make(chan struct{})
  driver.withExecutor(func(e Executor) {
    defer close(ch)
    e.Shutdown(driver)
  })
  // only stop the driver once Shutdown callback has completed
  go func() {
    defer t.Stop()
    select {
      case <-t.C:
      case <-ch:
    }
    // driver.stop() will cause process to eventually stop.
    driver.stop()
  }()

Thanks @jdef I'm thinking that we can keep driver.stop() in Shutdown callback, so that we won't have this issue. And you are right, we do need to keep a timer for shutdown grace period. Isn't executor_shutdown_grace_period one of mesos agent config? Why we need it here?