minio/minio-dotnet

Download large files

dmpriso opened this issue ยท 10 comments

When using async semantics to get a large file, the whole file must be read into RAM, also blocking a thread. This is especially an issue in a server environment.

One might expect the following to work:

    IO::Stream stream;
	
    await m_client.GetObjectAsync(this.m_bucket,
                path,
                s => stream = s,
                cancellationToken);
	
    await data = stream.read() ...

However that fails because the stream "s" must be consumed inside the stream callback. This is because RestSharp closes the request immediately after calling the callback.
There is actually a closed issue over there on that:

restsharp/RestSharp#539

While that may be an "edge case" for RestSharp, it is clearly an issue here where we could easily be dealing with large files. The only way to read a file is as in the example:

    IO::MemoryStream stream;
	
    await m_client.GetObjectAsync(this.m_bucket,
                path,
                s => s.CopyTo(stream),
                cancellationToken);
	
    data = stream.read() ...

However, that uses much RAM and blocks a thread in the threadpool when being run in a service.

I've noticed this behaviour too. I use minio in development environments, and google cloud storage in production environments. With the google storage client I can pass the stream that I want to download the file too directly i.e. DownloadObjectAsync(bucket, path, destinationStream, cancellationToken), and this works pretty well.

Just an idea, would switching to the new built in HttpClient provided in .NET Core help here? I'm not suggesting completely removing RestSharp, just for the download file methods.

The implementation could change to read from the input stream, into a buffer, and copy from that buffer to the destination as seen here. This is very similar to what the google cloud storage client does under the hood.

thanks @twgraham, that's worth exploring

We will take a look into this in the next milestone. @twgraham we are wrapping up our current priorities first. I will close it and park it in future milestone. We will reactivate from future milestone after we are done with current priorities.

+1. Yes please, downloading large files is currently not possible (atleast 1GB+ (note: self applied restriction in my app)) because of this. The official S3 API supports this (as you can wrap the GetObjectAsync inside an using, e.g. https://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingNetSDK.html)

Please consider to reopen this issue, because for the minio main use case it seems to have a high priority to support downloading files async and be able to stream files without loading the whole content into memory. A closed issue does not support improving this in near future.

Is this still planned? I thought it was strange when I first saw to access the stream you had to provide a callback, i guess that's just because of RestSharp?

IMO this makes the client library pretty limiting.

Still a problem, one have to state it is clearly NOT production ready when it comes to larger files paired with dotnet client.

Why this issue is closed?

My async wrapper to get rid of both callbacks and copying the whole file into RAM: https://stackoverflow.com/a/77158662/3334359

This issue should not be closed, downloading large files without loading them entirely into memory is a base functionality for using MinIO on production.

Returning a Stream object referring to the file contents in MinIO is required in order to make it work with ASP.NET which allows for downloading files via return File(stream, mimeType); in the controller.

Giving access to the Stream via a callback is extremely weird and unusable. You guys should look at it, if you want to be taken seriously.