Downloading of data from storage is relatively slow
Closed this issue · 5 comments
Describe the bug
Downloading a large number 250+ files of 10KB in size is processed at ~170KB/s. Downloading a small dataset in terms of storage usage can still take several minutes or even longer.
To Reproduce
Steps to reproduce the behavior:
- store 250+ files in the appropriate S3 location of the project
- Login as creator and download the dataset
- Observe the speed
Expected behavior
Although there are a lot of files, the storage volume is low and with a gigabit connection, this should download in seconds.
Screenshots
https://github.com/eyra/mono/assets/88683839/85b777cd-e927-4be4-86f9-80d431b41790
Desktop (please complete the following information):
- Safari + firefox
@TjerkNan ik heb geen invloed op de snelheid van downloaden. Dat lijkt me eerder een omgevings issue dan een software issue.
Als ik lokaal Next run op de dev S3 gaat het bloedje snel met 20+ files.
Heb je dit alleen met veel files en niet met grote files?
Ik heb dit issue gevonden in de Packmatic lib. Dit lijkt hetzelfde probleem.
Ik zal er in duiken
@TjerkNan Using connection pooling makes download twice as fast but it is still slow. This is caused by overhead costs of communicating with S3 for every single file. We can only make this less frustrating for the user by changing the UI and be very clear about being a little bit patient. In the future we might change to having download links that are prepared in the background. For now we can keep it as is and experience the it with the first pilots.
@emielvdveen course of action sounds totally fine to me. As long as people know what to expect, they are fine.
Speed is now around 3+MB/s and downloading 1000 files of 200K is totally fine, just takes a few minutes.