minio/minio-go

No retry loop on error net/http: timeout awaiting response headers and go 1.23

BChabanne opened this issue · 9 comments

After upgrading to go 1.23 we started to get some net/http: timeout awaiting response headers when using scaleway object storage.

This is due to a change in go net/http : golang/go@606b8ff
where http timeout requests are also context.DeadlineExceeded.

Because context.DeadlineExceeded is verified here

if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
this leads to the requests not being retried.

To reproduce this easily, it can be done by setting transport.ResponseHeaderTimeout = time.Nanosecond
With go 1.22 retry loop is entered while with go 1.23 it is not.

A quick and dirty fix for now is to disable the timeout

I do not think we should retry context.DeadlineExceeded here, because it will confuse the caller that already sets a deadline.

It looks like this error is coming from the transport, before you upgrade your project to go 1.23, you can copy DefaultTransport from transport.go, tweak it to make a higher header respone timeout and pass it to minio.Options{} when initializing a new MiniO client.

I understand why this condition is here and I agree that it should not be removed :

As a user, when calling minio-go functions with a WithDeadline(ctx), I want these functions to finish as soon as possible when the deadline is reached.

The change is probably more complicated than just removing this condition and I did not check yet how to implement it properly.

I think there are two kind of deadlines : the top deadline passed by user and bottom deadlines returned by "std/net" packages.
While stopping on "top" deadline is something needed, I think that it is better to enter the retry loop for "bottom" deadlines.

Changing Transport and passing it to minio.Options by increasing the header timeout (or totally disabling it) is not the best :
in the end an error would be returned while it could have been avoided

For example, in case of multipart put request, returning an error will force to do to a retry loop higher in the call stack and uploading the whole object from scratch instead of simply retrying the failing part.

The change in go 1.23 is that errors.Is(err, context.DeadlineExceeded) now returns true on timeout awaiting response so it revealed the problem to us.
But I guess fixing this issue would also fix broader errors : in case of dial error, the same situation would happen : errors.Is(err, context.DeadlineExceeded) would return true and the retry loop would not be entered.

What is your response header timeout? And what is the deadline set?

For now, I increased the response header timeout to 10 minutes (instead of minio.DefaultTransport 1 minute) and I did a retry loop wrapping FPutObject()

I do not use context.WithDeadline

what is the size of the object and how far are you when uploading to the scaleway storage?

The uploaded object was 4.6GiB and it was sent from inside scaleway infrastructure (ping : ~0.2ms)

The uploaded object was 4.6GiB and it was sent from inside scaleway infrastructure (ping : ~0.2ms)

were you not using multipart based upload? @BChabanne why would it take a 1min to upload each part?

I was using multipart upload. (for the record, the failing upload was randomly failing between partNumber 11 and 284)
The problem is probably not coming from uploading anyway because ResponseHeaderTimeout is started after the body is fully uploaded.
My speculation here is that somehow, the http requests get lost somewhere and no response is ever given.

With go 1.22 however, minio-go was resilient thanks to the retry loop.
However with go 1.23 as I explained above minio-go requires all PUT requests of a multipart upload to succeed without timeout instead of individually retrying failing one

Un-needed (and breaking) Go change in golang/go#50856 as you pointed out.

We want to retry if dial or something else times out, but not if caller context is canceled. We should check if the calling context is canceled and modify the signature to func isRequestErrorRetryable(ctx context.Context, err error) bool {.