Lombiq/Infrastructure-Scripts

Make Set-AzureWebAppStorageContentFromStorage faster (INFRA-139)

Closed this issue · 3 comments

Set-AzureWebAppStorageContentFromStorage copies blobs one by one. If you have a lot of files (like Orchard Media files), then this will take a lot of time, even if the total size is not that big (like taking >30 minutes to copy 13k blobs with 2,6 GiB total size).

Let's make this faster somehow. E.g., can containers be copied at once too? Or at least not each blob individually? Or can this loop be parallelized? While I don't know, I doubt any of the resources, including network, are maxed out on the CI machine during such a copy, so parallelization may help.

There might be some throttling as well, since the copy process might get stuck on files that are otherwise trivial.

Jira issue

Yeah, we've got some outdated stuff going on here. At the time of implementing this, there was no better way to apply the necessary filtering for blobs, but even without filtering AFAIR copy needed to be executed one by one, but that was a long time ago. My best guess (without any research done on this) is that az copy might be suitable for upgrading this process.

AzCopy can be used as well (or is it the same as az copy?). It doesn't download the files, but the copy operation still seems to happen one by one. This can be configured to be concurrent, however.

I think az copy is an umbrella command, so AFAIR it can be other types of resources too, but I don't know if it's using the same concept/library as AzCopy under hood when processing blobs. AzCopy is specific to Azure Blob Storage.

As much as I could find out, copy happens one-by-one (and definitely has to, when we need to do any filtering), but yes, concurrency would help anyway.