s3-redeploy

Node.js utility to sync files to Amazon S3 and, optionally, invalidate CloudFront distributions.

Designed to ease CloudFront + S3 website hosting management. May be used to efficiently upload files to S3.

Checkout Medium post for background and in-detail improvements description.

Usage

npm >= 5.2.0

$ npx s3-redeploy --bucket bucketName --cwd ./folder-to-sync

npm < 5.2.0

$ npm i --global s3-redeploy
$ s3-redeploy --bucket bucketName --cwd ./folder-to-sync

Options

Parameter name	Mandatory	Description	Default value	Usage examples
--bucket	yes	Name of S3 bucket where to sync the data	-	--bucket s3_bucket_name
--cwd	no	Path to folder to treat as current working one. Glob pattern is applied inside this directory	process.cwd()	--cwd ./website --cwd /home/user/website
--pattern	no	Glob pattern, applied within the `cwd` directory. The glob module is used to perform this operation. If no files match the pattern, the script will exit and leave bucket as is. This is done in order to prevent occasional bucket clearance due to wrong `pattern`/`cwd` combination.	'./**'	--pattern './.{js,html}' `` works as a `globstar`, see docs
--gzip	no	Indicates whether the content should be gzipped. A corresponding `Content-Encoding: gzip` header is added to objects being uploaded. If an array of extensions passed, only matching files will be gzipped. Array should be represented as a semicolon-separated list of extensions without dots.	false	--gzip --gzip 'html;js;css'
--profile	no	Name of AWS profile to be used by AWS SDK. See AWS Docs. If a region is specified in the credentials file under profile, it takes precedence over `--region` value	-	--profile stage_profile
--region	no	Name of the AWS region, where to apply the changes	-	--region eu-west-1
--cf-dist-id	no	Id of CloudFront distribution to invalidate once sync is completed	-	--cd-dist-id EDFDVBD632BHDS5
--cf-inv-paths	no	Semicolon-separated list of paths to invalidate in CloudFront	'/*'	--cf-inv-paths '/about;/help'
--ignore-map	no	Dictionary of files and correspondent hashes will be ignored upon difference computation during sync process. This is helpful if state of S3 bucket has been changed manually (not through s3-redeploy script) but the dictionary remained unchanged. The dictionary state will be omitted during computation and at the same time a new dictionary will be computed and uploaded to S3 so it could be used in further invocations.	false	--ignore-map
--no-map	no	Use this flag to store and use no file hashes dictionary at all. Each script invocation will result in uploading of all the files stored locally. If bucket already contains a dictionary file, it will be removed on next script invocation	false	--no-map
--no-rm	no	By default all the removed locally files will be also removed from S3 during sync. Use this flag to override default behavior and upload new files / update changed ones only. No files will be removed from S3. At the same time, the file hashes map (if used) will be updated to mirror relevant S3 bucket state properly.	false	--no-rm
--concurrency	no	Sets the maximum possible amount of network / file system operations to be ran in parallel. In particular, it means that files uploading will be performed in parallel. The same is true for file system operations. Note: it is safe to run file system operations in parallel due to streams API usage	5	--concurrency 8
--file-name	no	Utility by default uploads a file containing md5 hashes upon folder sync. This file is used during the sync operation and lets to minimize amount of network requests and computations. If file name changes, the file with old name will still remain in the bucket until a new sync performed	`_s3-rd.<bucket>.json`	--file-name hashes_map.json
--cache	no	Sets `Cache-Control: max-age=X` for uploaded files. Must be passed in seconds	-	--cache 3600
--immutable	no	Sets `Cache-Control: immutable` for uploaded files. For more info see article. Also check browsers support here	-	--immutable
--verbose	no	Adds additional info to logs: execution parameters, list of local file system objects, list of objects to be uploaded, list of objects to be deleted	false	--verbose

A simple lightweight validation process is implemented, but it is still possible to pass arguments in wrong format, e.g. --file-name is not checked against regex.

Background

Package provides an ability to sync a local folder with an Amazon S3 bucket and create an invalidation for a CloudFront distribution. Extremely helpful if you use S3 bucket as a hosting for your website.

The package has a really small amount of only well known and handy dependencies. It also uses no transpilers, etc., which means package contains no garbage dependencies and the size is rather small.

The idea was inspired by s3-deploy but another approach to work out the sync process has been taken. The default assumptions and set of functionality is also slightly different. Feel free to submit an issue or a feature request if something crucial is missing.

How it works

In general, the script lets one to sync a local folder state to S3 bucket and, if needed, creates an invalidation for a CloudFront distribution by id. All the S3 bucket and CloudFront distribution state manipulations are performed through AWS SDK for Node.js.

The common scenario is:

Script computes MD5 hashes for local files, filtered by cwd/pattern parameters combination and builds a so-called map of hashes.
Then, S3-stored objects' map of hashes is built.
- If no objects persist in S3, bucket is filled with local files and a map for locally stored files is uploaded.
- If there are already objects in bucket, script will look for a file with hashes map. It will be used in order to detect the difference between local and S3 states. If no map is found or no dictionary is intentionally used, ETags of S3 objects will be checked to determine possible changes. Keep in mind that AWS may fill ETag with non-MD5 value in certain cases. See AWS Docs on that. So, same files may be treated as different when relying on S3 ETags. Anyway, if ETags do not match, a local version of file will be re-uploaded. This way or another, bucket will get the updated file eventually.
Once difference is computed, it will be applied to S3: removed locally files will be also removed from S3, updated locally files will be uploaded to S3.
Single object's ETag is set to file's MD5 upon uploading.
After difference application, the updated map of file hashes is uploaded to S3. The map contains MD5 hashes list along with previous execution parameters.
If CloudFront distribution id is supplied, an invalidation will be created once sync process is complete.

A remotely stored dictionary is the main advantage of this package. It is a simple, gzipped file with a description of current state of S3 bucket. It drastically increases processing speed and provides an ability to decrease number of S3 requests. It also allows to abstract from AWS' mechanism of ETags computation.

The process may be tuned using flags mentioned above in different ways. See list of options.

IMPORTANT: If you change the state of bucket manually, the contents of the dictionary will not be updated. Thus, perform all the bucket update operations through the script or consider manual dictionary removal / --ignore-map flag usage, which will let the dictionary to be computed again and stored on the next script invocation.

Tests

Clone the repo and run the following command:

npm i && npm run test

License

MIT

Would be nice to do in future:

build redirect objects
use maps instead of objects
ability to list file names only (with no processing)

Additional things to consider:

check what could be done with versions and prefixes for S3 objects
take a look at ACL:private for hash map
copy objects instead of uploading again on meta change? tbc

batrudinych/s3-redeploy