Why this script:
Transferring small file to aws S3 can be very slow. I wrote this script to run several instances of "aws s3 sync" in parallele so we maximize our bandwidth.
Each instances will sync a subdirectory inside the root directory. So yes, it will only work if you have subdirectory. If you have only one big directory with all the files, it's worthless.
For example if you have the following directory :
Directory: D:\filerprod\images
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 13/10/2017 17:42 football
d----- 13/10/2017 17:43 motogp
d----- 13/10/2017 17:42 tennis
d----- 13/10/2017 17:42 volley
In this case, the script can run up to 4 instances of "aws s3 sync" command (or less), each one will sync one of the directory (football, motogp, etc).
The limit of command you can run is provided by the maxConcurrentCmd option.
This script does the following:
- Get a list of folders from a specified PATH
- Upload the folders in parallele.
Pre-requisites
- AWS ClI is install
- The "aws configure" has been run with valid keypair/region for the account.
The AWS S3 cli, provide additional configuration values you can use to tune your transfer. Have a look to :
- max_concurrent_requests
- use_accelerate_endpoint
See: http://docs.aws.amazon.com/cli/latest/topic/s3-config.html
How to use the script
from the powershell console
Just ran the command in PS shell.
Example:
./sync_s3.ps1 -rootFolder 'D:\Shares\images' -bucketName somebucketname -maxConcurrentCmd 5
from the task scheduler
Have a look to the example in this repositorie
What's next
- Add dryrun option (default valued to False)
- Add logging for command
- Add verbose switch