"upload_fileobj" downloads entire file before uploading
Closed this issue ยท 3 comments
- Async AWS SDK for Python version:
- Python version: 3.10
- Operating System: Docker image (Debian GNU/Linux 11 (bullseye))
- aioboto3 version: 9.3.1
Description
Hi! When I try to upload stream, stream file uploads entirely in memory. Is there any way to fix this?
Here you can see the results of memory profiler (https://github.com/bloomberg/memray) (file size ~ 88Mb):
Allocations results for test_upload_big_file_between_buckets:
๐ฆ Total memory allocated: 96.7MiB
๐ Total allocations: 372513
๐ Histogram of allocation sizes: |โ โ |
๐ฅ Biggest allocating functions:
- file_reader:/usr/local/lib/python3.10/site-packages/aioboto3/s3/inject.py:270 -> 88.0MiB
- raw_decode:/usr/local/lib/python3.10/json/decoder.py:353 -> 3.2MiB
- raw_decode:/usr/local/lib/python3.10/json/decoder.py:353 -> 2.4MiB
- init:/usr/local/lib/python3.10/site-packages/aiobotocore/httpsession.py:89 -> 1.1MiB
- _create_api_method:/usr/local/lib/python3.10/site-packages/botocore/client.py:397 -> 1.0MiB
There is no such problem in sync version - boto3.
What I Did
async with session.client(
's3',
aws_access_key_id=<>,
aws_secret_access_key=<>,
region_name=<>,
) as s3_client:
async with source['Body'] as raw_stream:
await s3_client.upload_fileobj(
raw_stream,
Bucket='bucket',
Key='filename',
)
You'll want to tweak these values - https://github.com/terrycain/aioboto3/blob/master/aioboto3/s3/inject.py#L202
Its possible s3transfer now uses different values since I copied them years ago, see if setting the max io queue to 2 does anything?
The chunksize of multipart chunks to upload is 8MB, and it'll read upto 100 of them before pausing, so im not too surprised.
Hi! When I try to upload stream, stream file uploads entirely in memory. Is there any way to fix this?
This is not always true. The default settings will load 80MB into memory then upload them. If you are uploading a file less than 80MB then yes it will load the entire file into memory. But for files larger than 80MB, it's a streaming algorithm. So nothing wrong with the code.
The s3.upload_fileobj
now mirrors the S3 Transfer implementation more closely. As for the initial issue, this is working as designed.