cristalhq/hedgedhttp

Investigate Best Practices with HTTP2

joe-elliott opened this issue ยท 7 comments

hedgedhttp cancels the context of the request which appears to have some issues with HTTP2. I'm unsure if everything is working as intended or if modifications could be made to this library to improve interaction with HTTP2. To demonstrate the issue at hand (without using hedgedhttp):

	client := &http.Client{
		Transport: &http2.Transport{},
	}

	req, err := http.NewRequest("GET", "https://google.com", nil)
	if err != nil {
		panic(err)
	}

	ctx, cancel := context.WithCancel(context.Background())
	req = req.WithContext(ctx)
	resp, err := client.Do(req)
	if err != nil {
		panic(err)
	}

	cancel()

	buffer := make([]byte, 1000000)
	_, err = resp.Body.Read(buffer)
	if err != nil && err != io.EOF {
		panic(err)
	}

	fmt.Println(string(buffer))

This code panics with error "context canceled" most of the time and occasionally panics with "unexpected EOF" (but never completes successfully).

Buried deep in the code http2 is using a pipe. The Read method is checking it's local member p.err which contains one of the above errors and it is returned all the way back to calling code. This is being set through CloseWithError which is triggered by the cancel() call.

CloseWithError is initiated (through a few layers) here:
https://github.com/golang/net/blob/abc453219eb5/http2/transport.go#L2287
Perhaps the transport/http2 pipe is reused across multiple http2 requests which is why I'm intermittently seeing it with the GCS client?

hedgedhttp is triggering this same behavior when interacting with GCS using the official GCS client.

Should we auto detect http2 and not cancel? Should we make cancelling optional? Should we just not cancel at all?

Thank you for filling this issue, I'm not a big pro of net/http internals, so cannot answer so quickly, that's why I'm not sure about which path is the best.

There is #6 which removes context cancelling at all (we just use original context inside request, without creating own), but also there is a sub-context in closure for runInPool and maybe the problem will be triggered there.

Sadly I don't have good bench/project/whatever where I can test it under real load (btw, can you share how much requests do you make when this problem appears?) and trying to debug in head and docs.

Also, jumping through the link from original PR I've found a thread on Google Groups, slightly changing the code I got something like this https://play.golang.org/p/iV2UMxZGPtr (you need to run it locally, url doesn't matter, feel free to change).

Locally this 'sometimes' appears to be 'always' ๐Ÿ‘€

I like the change in #6, but I think that we will need an optional cancel like was originally proposed. After you merge #6 I'll submit a PR for the optional cancel.

@joe-elliott idea is to cancel the context outside of the hedgehttp.

ctx, cancel := context.WithCancel(otherCtx)
defer cancel()

req := <create desired request>
req = req.WithContext(ctx)

// do hedgedhttp request
// read the body if it's present

// invoke cancel() here

In this way we're getting response, reading the body, canceling a context (it's scope is just to contain hedging), had a discussion with @storozhukBM yesterday, it may be slightly sub-optimal because we're not closing hanging requests as fast as possible, but it's a matter of 'how long do you process' the successfully received response.

I'm super eager to check how does #6 behaves in your situation and if it doesn't fix it, we can add optional cancelling or even cancelling failed requests manually (more code and more responsibility).

So, https://github.com/cristalhq/hedgedhttp/releases/tag/v0.3.0 is available, please test it ๐Ÿ˜‰

I have tested 0.4.0 on a http2 connection to www.google.com and with the gcs client. Worked like a charm with both.

Thank you for the work on this ๐Ÿ™