hashicorp/yamux

Using Session with http.Server can result in an infinite error loop in Accept

filipochnik opened this issue · 4 comments

Consider a situation when we have a client-server connection and a keepalive on the client side fails. The underlying connection gets closed without sending GoAway message.

Now, on the server side, there's a race who notices the connection got closed first and calls (*Session).exitErr(). If (*Session).recv wins (I think it also applies to (*Session).send) when (*Session).recvLoop is in the io.ReadFull call, the (*Session).exitErr() is called with "connection reset by peer" net.Error. That error is written to Session.shutdownErr and will be returned by (*Session).Accept on all subsequent calls.

And here's the problem. If that server Session is used as a net.Listener with http.Server it will cause an infinite error loop. That's because "connection reset by peer" is a temporary error and those are retried indefinitely, see here.

Ideally, all yamux errors should implement net.Error and set Temporary() and Timeout() appropriately, but that would be a breaking API change.
The backward compatible solution is to never return errors that implement net.Error because the context where they originally happen and the context where they are returned by yamux differ and lead to erroneous situations like this one.
In practical terms, it simply means wrapping errors passed to (*Session).exitErr() if they are not yamux errors.

I can submit a PR for this but I would first like to know if that's something you would accept.

yudai commented

@filipochnik Thank you for sharing. I hit the same issue. Would you mind if I asked you to share your fix?

@yudai
I ran a separate goroutine waiting on the session's close channel and then closing the server. Something like:

go func() {
	<-ln.(*yamux.Session).CloseChan()
	srv.Close()
}()
yudai commented

@filipochnik I see. That's a neat idea that doesn't require changing yamux itself. Thanks!

NetError now implements Temporary and Timeout, so this should be fixed.