datalad/git-remote-rclone

Support backends like crypt that don't support file hashes

Opened this issue · 5 comments

rclone's crypt backend does not support hashes. This fails with git-remote-rclone.

git-remote-rclone quits because it finds no hash on the server, and assumes the server to be empty. Temporary patch fix: comment out line 150:
if repo_hashes is None: return
This is only a hack, since other parts of the code still fail, or produce extraneous error messages. Also note: an initial git push to an empty repo does not work with this patch.

This request is to enhance git-remote-rclone to work on backends that don't support hashes. Some ideas and options:

  • falling back to downloading the file and computing the hash manually
  • work with a user-supplied function to compute hash. For rclone's crypt, this would involve computing the raw underlying file's hash

I will write a patch for this soon-ish. The plan:

  • Create a fallback for hashing In a way that records both latest git state and expected file content in the file name
    • Prefix with crypt- to indicate it is a crypt remote
    • Suffix with the last commit hash. This should be part of the file name
    • Suffix with the file size

This way will save us having to pull in the remote file to figure out whether or not everything is okay, which I don't want as it would be slow and the rclone remotes are slow enough as it is :)

Thanks for the discussion and ideas! I went ahead and implemented it, and have a rough but working implementation here. This version:

  • includes support for backends that don't support hashing, like crypt
    • doesn't distinguish between such backends. It simply doesn't have a dependency on the --hash feature of rclone any more, and instead computes and includes the hash in the filename
  • removes the dependency on 7z, and uses python's built-in gzip (via zlib that ships with python)
  • does a bunch of refactoring

At some point, I'll create a cleaner release. Or if you want me to merge it all in to this repo, do let me know too.

Released here:

 pip3 install git-remote-rclone-reds

looks interesting @redstreet -- thanks for sharing! But could you clarify -- is it backward compatible (could be used for already existing rclone remotes populated with this version) or not? (you mentioned that --hash any more, so would it compute exactly the same etc).

looks interesting @redstreet -- thanks for sharing! But could you clarify -- is it backward compatible (could be used for already existing rclone remotes populated with this version) or not? (you mentioned that --hash any more, so would it compute exactly the same etc).

Good question. It is not backward compatible with existing remotes. Migrating a repo is required.

However, migrating a repo back and forth between git-remote-rclone and git-remote-rclone-reds is very easy. A small one-time change is required. Instead of repo.7z, git-remote-rclone-reds uses repo-SHA.tar.gz. So in theory, you could unpack, repack, and rename, using the ~/gnu/git-remote-rclone/compute_sha.py that ships with git-remote-rclone-reds, and that works fine.

But there's a much easier way, which is to get git to do all the work:

# Upgrade to git-remote-rclone-reds
pip3 uninstall git-remote-rclone
pip3 install git-remote-rclone-reds

# Upgrade the repo.
git remote -v
origin  rclone://cloud/old (fetch)
origin  rclone://cloud/old (push)

git remote add new rclone://cloud/new
git push -u new main

# You can now verify rclone://cloud/new looks right, and then rename it to `origin`

This works for migrations in the opposite direction too.