reactor/reactor-pool

Quiet period & timeout wait for waiting requests and active connections

Closed this issue · 4 comments

As discussed in https://gitter.im/reactor/reactor-netty?at=61431b9863dca8189161a517 it would be great if the quiet period and the timeout take the waiting requets and active connections into account.

Motivation

When dealing with stateless pods on Kubernetes or similar, they can be terminated any time. In this case we'd like to ensure a graceful shutdown. W hen using Spring with WebClient we'd also like WebClient to shutdown/close it's pools gracefully. This means waiting for all submitted (http) requests. For us it is a common scenario that some requests can take up to 60s before receiving any bit. They should still be waited for for the specified timeout. Currently, the connection pool does not seem to be waited for when closing.

Desired solution

When (gracefully) shutting down, the quiet period and the timeout should wait for requests and active connections.

Considered alternatives

In our Spring specific context we could attach a application event listener to our apps and then call disposeLoopsAndConnectionsLater via a delayed subscription to simulate a graceful shutdown. Alternatively, as advised in the chat, we could read the connection pool metrics for active connections and pending requests and fiddle in some graceful shutdown ourselves.

This is going to be a complicated one at Pool level. There are several things impeding such a change:

  1. the pool doesn't track acquire()d poolable objects. It just tracks the number of such acquired objects, and expects the user to call either invalidate() or (more likely) release() when done with it. Since that is in the hands of the user, there's no guarantee it will actually happen once the "graceful shutdown" has be requested.
  2. this is quite difficult to shoehorn in the current contract, because there is no notion of intermediate termination. Pool#dispose() states that A) pending acquires should be terminated with an onError and B) idle resources should be destroyed, and that is the contract.
  3. The current SimpleQueuePool implementation even goes further as to detect pool is shut down by checking the pending acquire queue is a special empty instance (so effectively, calling pool.shutdown() will "clear" the pending queue).

However, it should be possible to implement a decorator InstrumentedPool that adds this trait.
Could that be a good compromise in 0.2.x ?

@simonbasle The idea is similar to what Reactor Netty has for the event loop i.e.

	/**
	 * Returns a Mono that triggers the disposal of the underlying LoopResources when subscribed to.
	 * It is guaranteed that the disposal of the underlying LoopResources will not happen before
	 * {@code quietPeriod} is over. If a task is submitted during the {@code quietPeriod},
	 * it is guaranteed to be accepted and the {@code quietPeriod} will start over.
	 *
	 * @param quietPeriod the quiet period as described above
	 * @param timeout the maximum amount of time to wait until the disposal of the underlying
	 * LoopResources regardless if a task was submitted during the quiet period
	 * @return a Mono representing the completion of the LoopResources disposal.
	 * @since 0.9.3
	 **/
	default Mono<Void> disposeLater(Duration quietPeriod, Duration timeout) {

https://github.com/reactor/reactor-netty/blob/234d0e139f2f7e30d1ff7028e3256d96faa3a87a/reactor-netty-core/src/main/java/reactor/netty/resources/LoopResources.java#L183-L198

So thus we give some time to active resources (those that are acquired) to return to the pool and eventually those that are waiting also to try to acquire and do the work.

Basically you will delay the disposal of the resources with some time.

I need the desired behavior to be better described with regards to:

  • currently acquired resources
  • pending acquire monos (added to the queue because 0 idle resource, maxSize was reached so no allocation possible, and maxPendingAcquireSize was not yet reached)
  • impact of any incoming acquire() attempts once graceful shutdown has been started

I don't really expect there are implementors of Pool interface out there
and the new method could default to throwing a NotImplementedException I guess...

My biggest concern is about having to rewrite a good chunk of the internals to support this at Pool level, with the drawbacks of added complexity... That's why I'm offering a decorator approach (of which I already have a first draft in #149 ).

But is there at least a consensus on the behavior?
Does graceful shutdown universally means letting pending acquires run their course? Should incoming acquire reset the grace period? Or should they be directly rejected?

Thanks for your replies already. Just my personal opinion on the last questions

Does graceful shutdown universally means letting pending acquires run their course?

I would say so, yes. Even going as far as saying that would be the definition of a graceful shutdown for me.

Should incoming acquire reset the grace period? Or should they be directly rejected?

In my opinion, depends. I can see both ways being acceptable. E.g. when closing a threadpool, when submitting new tasks I think it's more common to reject newly submitted tasks. Resetting the grace period isn't required for me and rather unexpected, but not something I would dislike.