ncw/swift

ObjectOpenFile.Read always reads 4096 bytes?

dmolesUC opened this issue · 4 comments

With Amazon's S3 API, I can use HTTP Range: to read objects in chunks of arbitrary size (in this case 5 MiB), as seen in this code.

But when I try something similar in swift, I always get 4096 bytes, regardless of the buffer size.

I tried setting the Range: header explicitly in ObjectOpen, and getting a new ObjectOpenFile for each 5-MiB chunk, but this didn't help.

Currently I'm just reading the whole thing at whatever rate ObjectOpenFile.Read returns it, but I'm concerned about the overhead. If I'm actually making a new HTTP request every 4 KiB, on a multi-gigabyte file, that adds up. Also, it seems like it would add more opportunities for dropped connections, retries, etc. Though that may not be true in practice.

(That said, I'm not sure whether there's actually a new behind-the-scenes HTTP request every 4 KiB, or if that's just io.ReadSeeker trying to be helpful.)

Is there a way to specify/increase the chunk size?

ncw commented

I'm not 100% sure why this is happening. The ObjectOpenFile.Read is a thin wrapper around http.Response.Body.Read if checkHash is false which is is in your case.

Note the last sentence from the io.Reader docs:

Read reads up to len(p) bytes into p. It returns the number of bytes read (0
<= n <= len(p)) and any error encountered. Even if Read returns n < len(p),
it may use all of p as scratch space during the call. If some data is
available but not len(p) bytes, Read conventionally returns what is
available instead of waiting for more.

It is quite possible that there was only 4k of data available right then.

I think therefore that ObjectOpenFile.Read is acting correctly.

You can use this little wrapper function to fill the buffer

// ReadFill reads as much data from r into buf as it can
//
// It reads until the buffer is full or r.Read returned an error.
//
// This is io.ReadFull but when you just want as much data as
// possible, not an exact size of block.
func ReadFill(r io.Reader, buf []byte) (n int, err error) {
	var nn int
	for n < len(buf) && err == nil {
		nn, err = r.Read(buf[n:])
		n += nn
	}
	return n, err
}

You can also use io.ReadFull but read its docs really carefully as there are a number of gotchas!

Yeah, I saw that same section in the docs and thought maybe there was an internal buffer just waiting for the Swift server to push out 4K worth of response body. Thanks for the code snippet, that looks helpful, as does the pointer to io.ReadFull.

Am I right in thinking I'm only making one HTTP request per Connection.ObjectOpen?

ncw commented

Am I right in thinking I'm only making one HTTP request per Connection.ObjectOpen?

Yes each Open should make one http request.

I updated my code to make a separate ObjectOpen for each ranged request, and, since I already know exactly how many bytes to expect, use io.ReadFull to fill the buffer. Works like a charm. Thanks again!