rfjakob/gocryptfs

Feature Request: --include / --include-from

brain-freeze opened this issue ยท 18 comments

Hey,
I switched to gocryptfs from encfs and it works great so far. However the performance over sshfs is much worse than it was with encfs (and #35 is still a problem). So I thought reverse mode may be the better idea for backup use cases anyways. The problem here is a missing --include/--include-from option, which works just like the parameter for rsync. This would allow it to create an encrypted view of certain folders instead of an entire directory.

Hi! Thanks for the report - what version do you run and how much worse is "much worse"?

I don't have some exact numbers, but I have the impression that it's related to directories with many small files. I will do some testing with this and report back.

The version was 1.4.3, but I couldn't reproduce any performance shortcomings compared to encfs. Probably I just got the impression after the replacement of encfs, but there isn't any real issue here.
Anyways, it would be great to get rid of that sshfs layer, because this definitely slows down the data transfer rate.

Lately I switched to reverse mode for better performance with remote syncing. An "exclude-from" command would come in here very handy, too. This could save some time to avoid syncing cache folders or VMs that I don't need backup for.

Might I suggest bind mounts to map the directories you want to exclude through from the underlying storage?

How would you do an exclude with bind mount? I'm using https://github.com/gburca/rofs-filtered for now.

mount --bind /unencrypted/storage/thing-that-does-not-need-encryption \
             /encrypted/mountpoint/thing-that-does-not-need-encryption

@charles-dyfis-net: interesting idea!

I think brain-freeze wanted to not copy files at all, not only not encrypt them. I guess you could bind mount an empty folder over the folder you want to exclude?

The --include-from/--exclude-from feature of rsync is crucial for my backup purposes. With reverse mode of gocryptfs it's not possible to use this anymore, because the file/folder names are encrypted as well. The only possibilities to overcome this, which are working with folders and files aswell, are (afaik):

  • Don't use reverse mode, but regular mode with underlying sshfs. This approach has really bad performance with my internet connection.
  • Filtering on the file system layer before data gets encrypted. I discovered rofs-filtered lately and it's exactly providing this feature for a read only view of a folder. For now it looks like if it's doing the job as expected, so that is a viable workaround. However it would be great to have this directly in gocryptfs at some point.

Option number 3 is to get the encrypted file names by searching for the inode number (find -inum) or with https://github.com/rfjakob/gocryptfs/blob/master/contrib/ctlsock-encrypt.bash

But i agree that --exclude would be much more user friendly. I'll see how this could be implemented. Problem is that the stdlib does not support passing multiple --exclude arguments. I would have to use another argument parsing library.

PS: This is how the backintime backup tool handles exclusions with encfs reverse mode.

PPS: looks like cobra is the cli library of choice nowadays.

Personally I think that --exclude-from, where you can specify a file containing all paths to exclude, is a bit more useful anyways. If you want to exclude stuff, it's likely that you want to exclude more than one folder/file. If you do this with multiple --exclude arguments you end up with a messy command line pretty fast.

I have just added an --exclude feature to reverse mode. Please test!

I have chosen --exclude over --exclude-from because it is more generic. You can still put your excludes into a file like this:

gocryptfs -reverse $(cat exclude.txt) /home/user /mnt/user.encrypted

With exclude.txt containing lines like this:

--exclude Movies
--exclude Music
--exclude "Ebook Collection"

Thanks, works nicely. The performance is better since there is no rofs-filtered anymore.

I will use this:

EXCLUDE=$(while read i; do echo "--exclude "${i}"";done < "excludes.txt")
gocryptfs -reverse $EXCLUDE folder1 folder2`

Thanks for testing! I'll call this ticket done.

The above code cannot possibly work. See BashFAQ #50: Quotes in the result of command substitutions or other expansions are treated as literal, not parsed by the shell as syntax.

Consequently, the quotes added to the EXCLUDE string by the above code would be passed to gocryptfs as literal parts of its argument vector list; and any directory name containing spaces would have the parts on each side passed as a separate argument (My Directory generating three arguments: --exclude, "My and Directory").

Quotes inside $(cat exclude.txt) are not honored for the same reason.

A variant that would work (using an array to collect the arguments, rather than collecting them in a string and expanding it unquoted) follows:

excludes=( )
while read -r i; do
  excludes+=( --exclude "$i" )
done <excludes.txt
gocryptfs -reverse "${excludes[@]}" folder1 folder2

...or, to support baseline POSIX sh (which doesn't support arrays), one can use a function to be able to override its argument list without making changes to the outer scope:

mount_with_excludes() {
  set --
  while read -r i; do
    set -- --exclude "$i" "$@"
  done
  gocryptfs "$@" folder1 folder2
}
mount_with_excludes -reverse folder1 folder2 <excludes.txt

...will prepend the --exclude arguments, generating a command line that behaves akin to gocryptfs --exclude Movies --exclude Music --exclude "Ebook Collection" -reverse folder1 folder2, if given a file that contains Music, Movies and Ebook Collection (no literal quotes!) as separate lines.

Heh. Actually, I'm going to have to withdraw that "cannot possibly work", a little: The quotes are all syntactic, not literal, in

EXCLUDE=$(while read i; do echo "--exclude "${i}"";done < "excludes.txt")

Basically, you have a quoted string "--exclude ", then an unquoted expansion ${i}, then an empty quoted string "". This isn't putting quotes around $i, as was presumably the authorial intent; rather, it's just ending the quotes before the expansion and running the expansion unquoted.

A line with My Directory will thus become --exclude, My, Directory, rather than --exclude, "My, Directory" -- still a wrong result, but not the specific wrong result I claimed above.

What's even more fun is that if you were trying to exclude a directory named My Work * KEEP OUT *, you'd get the whitespace-surrounded *s replaced with a list of names in the location where the script is being run.