topfreegames/fluxcloud

Same context used for both exporters

Closed this issue · 3 comments

I found an issue with context.Context cancelation when having a large changelog and multiple exporters.

Fluxcloud uses the same Context for every exporters and use the net/http Context.

In our setup, we publish the changelog on two differents Slack channels and a webhook. The changelog is pretty large and a context cancelation occurs on the post to the second channel (because Flux closes the request due to timeout). All subsequents reporters are failing due to the context canceled by the caller.

Here the flux logs

caller=sync.go:434 component=daemon err="executing HTTP request: executing HTTP request: Post \"http://fluxcloud/v6/events\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

Here the fluxcloud logs

Could not post to slack:Post https://hooks.slack.com/services/TXXX/BXXX/XXXX: context canceled

I don't know how the interaction should work with Flux, but is it possible to define a timeout per exporters (by wrapping Context in WithTimeout and/or run the exporters calls asynchronously within a goroutine ? This way fluxcloud always respond OK to Flux, and logs tell us if something goes wrong, each exporter has its dedicated timeout and context that don't interfere with others.

Hey there @ViBiOh, first of all, thanks for the idea. I think it makes sense to be able to configure the timeouts individually. Checking the code the Http context is used here: https://github.com/topfreegames/fluxcloud/blob/master/pkg/exporters/slack.go#L92 and defined here: https://github.com/topfreegames/fluxcloud/blob/master/pkg/apis/v6.go#L35

Seems the current configuration doesn't allow custom context per exporters, but should be easy to implement at least an global timeout parameter.

WDYT?

After reflexion, setting the timeout for contexts used to notify exporter will not work, because the root Context, the *http.Request.Context() from the caller, will be canceled: flux closes the request based on its own timeout, which cancel the Context in fluxcloud.

I think the most appropriate way is to start a goroutine that declares its own context. Something like this:

for _, exporter := range config.Exporter {
  go func(exporter exporters.Exporter) {
    ctx, cancelFn := context.WithTimeout(context.Background(), time.Second * 30)
    defer cancelFn()

    exporter.Send(ctx ...)
  }(exporter)
}

The side-effect is that flux will always receive HTTP/200 status because we can't wait for exporter's response (it's the source of the problem, the caller timeout).

A trade-off can be to wait for all goroutine to ends (with a WaitGroup). Running exporters' calls in parallel can reduce overall notification time, but maybe not enough.

Edit: if there is opentracing, we should also handle it when creating new context.

Sorry for the delay @ViBiOh!

The side-effect is that flux will always receive HTTP/200 status because we can't wait for exporter's response (it's the source of the problem, the caller timeout).

Don't think its a good practice to always answer a HTTP/200.

A trade-off can be to wait for all goroutine to ends (with a WaitGroup). Running exporters' calls in parallel can reduce overall notification time, but maybe not enough.

Don't we end up in the same case, by having a timeout for the goroutines to end?