yakovkhalinsky/backblaze-b2

Auto-compute SHA1 sum for streams

Opened this issue · 0 comments

Related to #32. Applies to uploadPart and uploadFile.

If hash is not passed and data is a stream, the hash can be computed on the fly and appended to the output, while providing the header X-Bz-Content-Sha1: hex_digits_at_end. It would be nice if the client would wrap up this logic itself.

This change is simpler than it seems at first. I wrote the following transform stream that hashes the content as it passes through, then emits the hash before the stream ends. We are using this in production successfully.

const crypto = require('crypto');
const stream = require('stream');

function makeSha1AppendingStream() {
    const d = crypto.createHash('sha1');

    return new stream.Transform({
        transform(chunk, encoding, cb) {
            d.update(chunk, encoding);
            this.push(chunk, encoding);
            cb();
        },

        flush(cb) {
            this.push(d.digest('hex'));
            cb();
        },
    });
}

Used simply like (adjust variable names as needed):

if (hash === undefined && typeof data.pipe === 'function') {
  const hashStream = makeSha1AppendingStream();
  data.on('error', err => { hashStream.emit('error', err); });
  data = data.pipe(hashStream);

  hash = 'hex_digits_at_end';
  contentLength += 40;
}

Side note: if streams are used, all retrying/redirect-following should be disabled. This is either unsafe since the stream has been consumed, or will likely consume a large amount of memory as the entire request body is buffered in memory in case the request needs to be replayed. We had to pass maxRedirects: 0 to axios or process memory would balloon (we're uploading several-hundred-MB files and this was killing us).