/s3-xxHash-transfer

Shell scripts to automate xxHash checksum verification for transfers via the AWS S3 CLI

Primary LanguagePowerShellMIT LicenseMIT

S3 xxHash Transfer

Included in this repository are shell scripts that glue together the MHL tool and the AWS CLI. This allows for a workflow that can transfer enormous amounts of data through an S3 bucket with extremely fast checksum verification. These scripts ensure bulletproof data integrity, verifying every single bit, with blazingly fast speed afforded by 64-bit xxHash and AWS S3.

Workflow

  1. On the "source" computer:
    1. Depending on your OS:
      1. On macOS or Linux, run $ sh send.sh. sudo privilege is not required.
      2. On Windows, run PS> send.ps1.
    2. The "send" script will prompt for the source directory, "seal" the contents of the directory with 64-bit xxHash checksums, prompt for an S3 URL, and then ingest the entire directory into the bucket.
  2. On the "destination" computer:
    1. Depending on your OS:
      1. On macOS or Linux, run $ sh receive.sh. sudo privilege is not required.
      2. On Windows, run PS> receive.ps1.
    2. The "receive" script will prompt for the S3 URL specified earlier by the "send" script, prompt for the local directory path into where the data will be downloaded, and then will automatically download the specified data from the S3 bucket and verify their 64-bit xxHash checksums.

The MHL file generated on the sending side and verified on the receiving side functions as as a kind of manifest for the data, which ensures end-to-end data integrity. These scripts use the extremely fast 64-bit xxHash hashing algorithm.

System requirements

  • The MHL tool should be installed into your $PATH. On CentOS 7.8 and Fedora 32, after compiling from source so that mhl will call the properly installed versions of the OpenSSL libraries, it is recommended that you manually move the mhl binary into /usr/local/bin, since the program will not be managed by the distribution's package manager.
  • The .pkg installer from Pomfort will install a precompiled binary for macOS into /usr/local/bin, which is included by default in macOS's $PATH.
  • On Windows, download and extract the precompiled binary from Pomfort, and then copy or move mhl.exe into C:\Windows\System32\, which is included by default in the Windows Path system environment variables.
  • The AWS CLI should be installed and configured on both endpoints, with:
    • The sending IAM user having permission to write into the S3 bucket
    • The receiving IAM user having permission to read from the S3 bucket
    • Both endpoints connected to the same region
    • The command output format set to text

Tested platforms

Release 2.0.0 has been tested on Linux, macOS, and Windows, specifically on:

  • Fedora 32 and higher
  • CentOS 7.8.2003 and higher
  • macOS Catalina 10.15.4 and higher
  • Windows 10 1909 and higher

There aren't too many dependencies, so these scripts seem like they should work flawlessly on other major Linux distributions as well, though no other distributions have been tested.

Though zsh is now the default shell on macOS Catalina, the script runs in bash, as specified from the first line of the script: #!/bin/bash. For now, Catalina still ships bash. Whether future releases of macOS will contain bash is an open question. The scripts may need to be modified in the future to run natively in zsh, but at least for now, on Catalina, bash works.