peak/s5cmd

Very low throughput when uploading single file compared to using awscli

Opened this issue · 3 comments

Uploading a single file of around 152M is significantly slower using s5cmd compared to using awscli. awscli is able to achieve a throughput of around ~55MiB/s whereas s5cmd is only able to reach ~4.4MiB/s. I tested with various concurrency settings (1, 5, 10, 25, 50) and always 1 worker (since it's a single file) and it makes close to no difference. I also tested with various file size: 36M, 152M, 545M, 2.6G, 6.9G and I can observe the same low throughput.

Here's a screenshot of a network capture I made comparing awscli (left) and s5cmd (right) using a concurrency setting of 5:

Screenshot 2023-09-22 at 9 44 41 AM

It seems like s5cmd is transferring the file into many smaller chunks instead of fewer bigger chunks like awscli is doing.

The command I'm using is:

s5cmd \
    --profile my_profile --numworkers=1 \
    --endpoint-url=https://mycephs3endpoint \
    cp --concurrency=5 --show-progress \
    "${temp_dir}/archive.tar.lz4" \
    "s3://${bucket_name}/test-mboutet/${key}/archive.tar.lz4"

Versions:

❯ aws --version
aws-cli/2.11.5 Python/3.11.2 Linux/5.4.0-163-generic exe/x86_64.ubuntu.20 prompt/off

❯ s5cmd version
v2.2.2-48f7e59

I'm using Ceph S3 and I'm able to reproduce the issue when running the same upload command on other servers.

Hi, there is a flag part-size for the cp command. You can adjust the chunk size as you wish.

@denizsurmeli, unfortunately --part-size didn't help.

I tested with all the combinations of the following parameters:

  • Object size to upload: 36M, 152M, 545M
  • Concurrency: 1, 5, 10, 25, 50, 100
  • Part size: 5, 10, 25, 50

concurrency = 25, part_size = 10 gave the best throughput (around 20 MB/s), while most of the other combinations yield throughputs of 2-5 MB/s. 20 MB/s is still way below what awscli is able to do. For small objects less than around 20MB, s5cmd wins, but it's just because it has no overhead at startup whereas awscli has around 6-7s overhead before it actually starts to do something.

Just for reference:

The problem seems to be related to #418

At the time I tried to tackle it but I couldn't:(

I made a few attempts to optimize write requests to achieve increase throughput without using the storage optimized instances. But I couldn't find a viable solution.

#418 (comment)

see also #418 (comment)_