droyo/styx

Consuming IO messages across decoder buffer boundary corrupts decoder state

Closed this issue · 1 comments

droyo commented

One of the constraints I made when writing the styxproto package was that it use a fixed-size buffer per connection. Another, perhaps unnecessary constraint was to support very small (~.5KB) and very large (2GB) msize parameters, independent of the internal buffer size. One consequence that came out of these constraints is that the two largest messages, Rread and Twrite, do not support random access. Instead they implement the io.Reader interface.

When one of these messages does not fit completely in the Decoder buffer, io.MultiReader is used to combine the buffered data (using a bytes.Reader) with an *io.LimitedReader that consumes from the underlying connection, like so:

m.r = bytes.NewReader(buffered)
if int64(len(buffered)) < count {
	m.r = io.MultiReader(
		m.r,
		io.LimitReader(r, count-int64(len(buffered))))
}

When I wrote this, I thought I was being clever. Unfortunately, as far as I can tell, it's never worked. This would only manifest when reading or writing large files.

After doing a bit of troubleshooting, I believe the problem is in decoder.go, here:

func (s *Decoder) Next() bool {
	if s.msg != nil {
		s.err = discard(s.br, s.msg.Len())

discard() gets rid of msg.Len() bytes from the bufio.Reader, but the underlying connection has already been advanced by msg.Len() - len(buffered) bytes.

droyo commented

One complication is that this only occurs when the Read method is called on one of these messages. If all messages were fully consumed, this would be as simple as replacing discard(s.br, s.msg.Len()) with discard(s.br, len(s.msg.bytes())). But since we can't guarantee that, we have to preface it with io.Copy(ioutil.Discard, s.msg).