"upload_fileobj" downloads entire file before uploading

Question

"upload_fileobj" downloads entire file before uploading

Closed this issue 7 months ago · 3 comments

Async AWS SDK for Python version:
Python version: 3.10
Operating System: Docker image (Debian GNU/Linux 11 (bullseye))
aioboto3 version: 9.3.1

Description

Hi! When I try to upload stream, stream file uploads entirely in memory. Is there any way to fix this?

Here you can see the results of memory profiler (https://github.com/bloomberg/memray) (file size ~ 88Mb):

Allocations results for test_upload_big_file_between_buckets:
📦 Total memory allocated: 96.7MiB
📏 Total allocations: 372513
📊 Histogram of allocation sizes: |█ ▄ |
🥇 Biggest allocating functions:
- file_reader:/usr/local/lib/python3.10/site-packages/aioboto3/s3/inject.py:270 -> 88.0MiB
- raw_decode:/usr/local/lib/python3.10/json/decoder.py:353 -> 3.2MiB
- raw_decode:/usr/local/lib/python3.10/json/decoder.py:353 -> 2.4MiB
- init:/usr/local/lib/python3.10/site-packages/aiobotocore/httpsession.py:89 -> 1.1MiB
- _create_api_method:/usr/local/lib/python3.10/site-packages/botocore/client.py:397 -> 1.0MiB

There is no such problem in sync version - boto3.

What I Did

async with session.client(
      's3',
      aws_access_key_id=<>,
      aws_secret_access_key=<>,
      region_name=<>,
) as s3_client:
      async with source['Body'] as raw_stream:
            await s3_client.upload_fileobj(
                  raw_stream,
                  Bucket='bucket',
                  Key='filename',
            )

Answer 1 · 2022-08-10T19:40:10.000Z

You'll want to tweak these values - https://github.com/terrycain/aioboto3/blob/master/aioboto3/s3/inject.py#L202
Its possible s3transfer now uses different values since I copied them years ago, see if setting the max io queue to 2 does anything?

The chunksize of multipart chunks to upload is 8MB, and it'll read upto 100 of them before pausing, so im not too surprised.

Answer 2 · 2023-09-26T21:19:45.000Z

@anton-zelenskiy

Hi! When I try to upload stream, stream file uploads entirely in memory. Is there any way to fix this?

This is not always true. The default settings will load 80MB into memory then upload them. If you are uploading a file less than 80MB then yes it will load the entire file into memory. But for files larger than 80MB, it's a streaming algorithm. So nothing wrong with the code.

Answer 3 · 2024-05-27T16:08:07.000Z

The s3.upload_fileobj now mirrors the S3 Transfer implementation more closely. As for the initial issue, this is working as designed.