peak/s5cmd

Download speed reduces with large files (> ~100GB)

Opened this issue · 1 comments

Using s5cmd v2.2.1-be63977, I am noticing good throughput in the beginning of download (~900MBps) but it gradually reduces to 300MBps and stays on this speed.

Command used:
s5cmd.exe cp --sp --concurrency 8 "s3 file path" "local path"

If concurrency is not passed as a parameter, then download speed remains constant throughout the download (~280MBps). All this is being done on an EC2 instance with following configuration:

  • Instance Type: C6a.12xlarge
  • Volume Type: gp3
  • IOPS: 4000
  • Throughput: 1000MB/s
  • OS: Windows Server 2022

Am I using concurrency wrong? Or is there a bug in its implementation?

Technically you are not using it wrong and there is no bug (in the sense we usually mean).

according to aws docs gp3 have 64 KiB I/O size. with the concurrent downloads we are using random writes not sequential writes. So actual download speed is limited by the IOPS * I/O size that is 4000*64KiB or about 256 MiB.

aforementioned aws docs states IOPS limit as 16000. If possible, increasing IOPS limit to 16000 would increase write throughput up to 1000 MBps.

same problem (situation?) was also mentioned in #418 and #667 .
At some time I, naively, attempted to speed up by sequentializing writes, however I couldn't succeed. Probably the physical disk used by EBS Volume is shared by other people (and allocated EBS volume might be distributed to multiple physical disks), hence it does not seem possible to force sequential writes.