rhnvrm/simples3

detectFileSize does not measure size of the entire stream

Closed this issue · 2 comments

Hello,

When uploading small objects to my local Minio instance, I got the following error from s3.FileUpload()

400 Bad Request: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>EntityTooSmall</Code><Message>Your proposed upload is smaller than the minimum allowed object size.</Message><BucketName>u9k-dev</BucketName><Resource>/u9k-dev</Resource><RequestId>16354459047EE460</RequestId><HostId>4a57eb7d-dd77-496f-aee1-f604ca81f4f8</HostId></Error>

This was strange because I should also be able to upload small files.
I figured out that it was happening after I added a content type detection function in my code, which does the following:

// adapted from https://golangcode.com/get-the-content-type-of-file/
func getFileContentType(r io.Reader) string {
	// Only the first 512 bytes are used to sniff the content type.
	buffer := make([]byte, 512)

	_, err := r.Read(buffer)
	if err != nil {
		log.Printf("Failed to detect Content-Type: %s\n", err)
		return "application/octet-stream"
	}

	// Use the net/http package's handy DectectContentType function. Always returns a valid
	// content-type by returning "application/octet-stream" if no others seemed to match.
	contentType := http.DetectContentType(buffer)
	return contentType
}

Nothing complicated, just reads the first 512 bytes of the filestream and analyzes it.

But this breaks s3.FileUpload, in particular the detectFileSize routine, because it does not count from the beginning. I believe this is a bug.

	pos, err := body.Seek(0, 1) // this does not do anything, because it is seeking 0 bytes relative to the current position (1 = io.SeekCurrent)
	if err != nil {
		return -1, err
	}
	defer body.Seek(pos, 0)

	n, err := body.Seek(0, 2)
	if err != nil {
		return -1, err
	}
	return n, nil

See https://godoc.org/io#Seeker for reference.

A version of the function that calculates the size of the entire stream would look like this:

	pos, err := body.Seek(0, io.SeekStart)
	if err != nil {
		return -1, err
	}
	defer body.Seek(pos, 0)

	n, err := body.Seek(0, io.SeekEnd)
	if err != nil {
		return -1, err
	}
	return n, nil

Please let me know (and document) if this is intended behavior.

func detectFileSize(body io.Seeker) (int64, error) {

Hey

Thanks for raising the issue.

This is intended behavior as per my understanding. At the time of implementing the library, I think I had thought, that a library should ideally not seek and change the intended location since we are passing an interface and caller could have a different intention with the underlying implementation (apart from os.File).

// this does not do anything, because it is seeking 0 bytes relative to the current position (1 = io.SeekCurrent)

The first seek, is used to get the pos to revert to after we are done calculating the size using the defer statement.

This was also based on how it was being done by the AWS SDK (in a similar manner):

https://github.com/aws/aws-sdk-go/blob/bcb2cf3fc2263c8c28b3119b07d2dbb44d7c93a0/aws/types.go#L130

func seekerLen(s io.Seeker) (int64, error) {
	curOffset, err := s.Seek(0, sdkio.SeekCurrent)
	if err != nil {
		return 0, err
	}

	endOffset, err := s.Seek(0, sdkio.SeekEnd)
	if err != nil {
		return 0, err
	}

	_, err = s.Seek(curOffset, sdkio.SeekStart)
	if err != nil {
		return 0, err
	}

	return endOffset - curOffset, nil
}

Just checked a few places where simples3 is used in handlers. After trying to get the MIMEType for validation, I can see users have done a seek back using .Seek(0,0) after the mime detection function.

I think we can add some documentation here: https://github.com/rhnvrm/simples3/blob/master/simples3.go#L44

Apart from this, having gone through the implementation in aws-sdk-go just now again, I can see that there might be a potential bug in the current implementation. Since .Seek returns current offset and not the number of bytes seeked. I think the current logic seems to return an incorrect value in your case and might need to be changed to how the sdk does it. This might have caused the size mismatch you posted. I will investigate if this is actually the case and get back.

Hey, thanks for getting back to me.

that a library should ideally not seek and change the intended location since we are passing an interface and caller could have a different intention with the underlying implementation

I fully agree with this and also think now (thanks to your explanations) that this is the right behavior.
What confused me was the fact that the Seek position is apparently modified by my detectContentType function, which does not even take an io.Seeker, but just an io.Reader!

After trying to get the MIMEType for validation, I can see users have done a seek back using .Seek(0,0) after the mime detection function.
I guess I will need to use io.ReadSeeker instead of io.Reader for the detectContentType function, just to be able to go back to the beginning.

Also, neither io.Reader.Read() nor io.Seeker.Seek() document that reading actually advances the seek position:
https://godoc.org/io#Reader

I'm happy to close this issue now.