odeke-em/drive

pull/push: add end-to-end encryption and decryption

odeke-em opened this issue ยท 14 comments

It would be great to have the ability for a user to do encrypted/decrypted pushes/pulls, after the user passes in their credentials, and the subject content should be stored encrypted on Google Drive but only decrypted by the user.

Such was done as an experiment ages ago with drive and with https://github.com/odeke-em/sasel -- my first real-ish Java program, so don't judge :)

Sounds like a very good idea ๐Ÿ‘

Disclaimer: I suck at cryptography and know nothing about it but I'd love to learn and work with ideas on it.

I have been thinking about this lately and so far I haven't found a good way in Go to do this. The crypto examples and APIs all seem to be return a []byte slice (even though they might take in an io.Reader). This means that the result of any operation will be wholly buffered into memory, and for example if it is a 5GB file, this could potentially blow up memory.
Adapting from wiki entry https://github.com/odeke-em/drive/wiki/%5BUse-Case%5D-Backup,-archive,-encode-and-push-directory.#above-encrypted-with-openssl-aes-256, we can easily wrap this into

$ drive push --pass-key-file=passKey --enc [files...]

Adaptation

We can adapt that option to build a small package that basically translates to:

Encryption

a) Obtain an io.Reader for that file.
b) exec to openssl and then pass the reader in a) as stdin to openssl enc -aes-256-cbc -salt, along with the passKey(which could be written to a tempfile instead of passed in directly as an argument to the exec).
c) Obtain the Stdout from the openssl exec and then use that as the handle to a drive file upload.

Decryption

Would be the reverse of encryption ie remote source, local destination, decrypt instead of encrypt.
a) Obtain the io.Reader handle for the encrypted file on Google Drive.
b) exec to openssl and pass the reader in a) as stdin to openssl enc -d -aes-256-cbc -salt, along with the passKey(which could be written to a tempfile instead of passed in directly as an argument to the exec).
c) Obtain the Stdout from the openssl exec and use that as the source of data to write to file.

In short, a simple middle man module that will take a stream of bytes, a mode (encryption/decryption), encryption key and algorithm e.g

type CipherMode uint
const (
    Unknown = 1 << iota
    Encrypt
    Decrypt
)

const (
   DefaultOpenSSLAlgo = "aes-256-cbc" // Define these as enums
)

func Cipher(src io.Reader, mode CipherMode, key []byte, openSSLAlgorithms ...string) (io.Reader, error)

func Encrypt(src io.Reader, key []byte, openSSLAlgorithms ...string) (io.Reader, error) {
    return Cipher(src, Encrypt, key, openSSLAlgorithms...)
}

func Decrypt(src io.Reader, key []byte, openSSLAlgorithms ...string) (io.Reader, error) {
    return Cipher(src, Decrypt, key, openSSLAlgorithms...)
}
Example when uploading
fHandle, err := os.Open(localPath)
if uploadArgs.Encrypt && err == nil {
  fHandle, err = enc.Encrypt(fHandle, uploadArgs.EncryptionKey, uploadArgs.Algorithms...)
}
if err != nil {
  return nil, err
}

// Write to cloud handle now
Example when downloading
fHandle, err := g.remote.Open(remotePath)
if uploadArgs.Decrypt && err == nil {
  fHandle, err = enc.Decrypt(fHandle, uploadArgs.EncryptionKey, uploadArgs.Algorithms...)
}
if err != nil {
  return nil, err
}

// Write to disk now
localF, err := os.Create(g.context.AbsPath(remotePath))
...
io.Copy(localF, fHandle)

Ideas, thoughts, criticisms are greatly appreciated. Thanks.

The AES block cipher in golang should be sufficient and since it is a block cipher should only have to operate on the 128 bits at a time. The []byte in and out shouldn't be an issue, I have some similar code in a project of mine that reads 4 byte blocks and performs a function on them to swap bits around but implements io.Reader https://github.com/sselph/scraper/blob/master/rom/hash/hash.go#L148

You should also considering use a MAC to authenticate the message contents. This is great for determining that the passkey is correct; otherwise you decrypt data not knowing if the key is right and the data coming out is random.

I actually had some time and this seemed like a fun thing to do, so I decided to write some code. I was able to use pure Go and can encrypt large files without holding the file in memory. I used AES-256 in CTR mode with random per file IV. I used a SHA-256 HMAC for authenticating the message. This was less ideal than using GCM but allowed me not to hold the entire file in memory. The keys used for AES and HMAC are created using scrypt with password provided by the user and a random per file salt. I created 2 structs that wrap io.Reader and implement io.Reader but encrypt/decrypt while reading. The only issue is that for the decryption I had to read the encrypted contents to a temp file to authenticate it before decrypting.

Using it looks something like:

encReader, err := NewEncryptReader(reader, password)
b, err := ioutil.ReadAll(encReader)  // b is encrypted
decReader, err := NewDecryptReader(bytes.NewBuffer(b), password)
defer decReader.Close()
b, err = ioutil.ReadAll(decReader)  // b is decrypted

I need to clean it up, add documentation, add testing, add some logic for having multiple algorithms and determining which was used to encrypt, etc. I also haven't added it to the drive push/pull commands and have just been using byte slices in and out to test it.

There is also the matter of determining if the remote file is different than the local file. The hashes won't match obviously and since the file has a random per file salt+iv we can't just re-encrypt locally to see if the hash would match. There are solutions to this but they all have some drawbacks.

If this sounds like the right track, I can continue to polish it up and can send a PR to a branch for this feature but I would probably need to leave it up to you to actually plumb in the code to make use of this.

Awesome ๐Ÿ”ฅ๐Ÿ”ฅ, thanks for working on this @sselph. Ahh I see. That's great news that this can be done 128 bits at a time, we are in business!
Your implementation wraps this into io.Reader/Writer consumers which is what we want. Thanks for the example in #543 (comment).

Since I am not that familiar with block ciphers, I wanted to ask does decryption work the same way ie chunk by chunk?

Great idea about using a MAC to check if the key is right. Just as a note to us when rolling this out, it may increase the attack surface for someone trying to crack a password/decrypt files. I mean the attacker could easily do it with a few techniques such as frequency analysis on every round of decryption or checking mimes but with a MAC that becomes less deterent with fewer bytes making for more retries i.e

func sign(hash, challenge []byte) []byte {
   mac := hmac.New(theChosenAlgo, hash)
   mac.Write(challenge)
   return mac.Sum(nil)
}

// Arbitrary: we can use 4 Blocks or chose however many bytes
// ephemeralTextChallenge could be fresh RandomBytes(), etc
localMAC := sign(password+first512BytesBeforeEncryptionAndUpload, ephemeralTextChallenge)
remoteMAC := sign(password+firstDecrypted512BytesAfterDownloading, ephemeralTextChallenge)
if !hmac.Equal(localMAC, remoteMAC) {
   return nil, ErrInvalidPassword
}

would be easier to crack than decrypting an entire 1GB file and then fail. However, the attacker could just have the file downloaded and then go at it locally with various techniques so never mind, let's go with the MAC solution.
If need be we could let the user choose what mode they'd like e.g whole file decrypting, no MAC-ing etc.

As for determining if the remote file is different from the local file, maybe we could use the MAC comparison provided above as a compromise? Also, here is an bad and unsecure idea that breaks the purpose of random salt+IV concatenated to the file, but just an idea nonetheless -- we could potentially store some meta data about the file it in the Meta part of the remote file e.g the hash without the salt+IV included, or does the crypto API hide the ability for us to seek to areas in which the Salt+IV are not included? Using meta fields we can also add some unique identifiers that will match the file?

Please let me know what you think.

I was actually gonna take a stab at the exec-d openssl style I had talked about earlier on but was doing some bug triage and didn't get time to work on it yet. This sounds like the right track to me, am excited to see this come alive, thank you again @sselph!

Since I am not that familiar with block ciphers, I wanted to ask does decryption work the same way ie chunk by chunk?

Yes but as I mentioned, when NewDecryptReader is called it will read all the bytes from the provided reader via 16KB chunks to a temp file. This is because I have to hash the entire file to know if it is valid and safe to decrypt.

it may increase the attack surface for someone trying to crack a password/decrypt files.

I don't think this is too much of a concern since the attacker still has to hash the entire file to compute the MAC. It would probably be easier to ignore the MAC and just decrypt the first X bytes and see if the header looks like what the file extension says it is supposed to look. In either case the attacker has to use scrypt to generate the keys which in the current implementation I have set it to take ~1s with a 3.4GHz processor.

As for determining if the remote file is different from the local file

The location of the salt and IV are known(first 64B of the file) and they are unencrypted.

Some ideas I had are:

  • Store the original hash in the remote file unencrypted at the beginning of the file then do a partial download of that information or if there is a way to store metadata that is cool too. This seems like it might leak information and might be bad but I don't know enough about this to be sure.
  • Get the salt+iv and encrypt locally while computing the md5 hash for comparison. This has the draw back that computing the encryption key is purposely slow. But maybe the encrypted MD5 could be cached locally to speed up subsequent runs. If you already have a method to cache metadata locally, this is my current favorite.
  • Store the original hash in the remote file encrypted at the beginning of the file. Then do a partial download of that information. This has the same drawback as the previous but doesn't require encrypting the entire file.
  • Don't use a key derivation function and a password and force the user to pass in the full AES and HMAC keys. This solves the issue of taking time to derive a key but means it is more difficult to remember, ie you have to store them locally and back them up.
  • Speed up the key derivation. This makes it easier to brute force but does make generating keys faster and makes some of the other methods less painful.
  • Don't use a per file salt. This shouldn't be a concern since I'm using a different IV for each file and so the salt could be generated once per client and makes it easier to cache keys in memory.

Wanted to give an update. I cleaned up the code, added documentation, and added some tests that take random data of varying sizes and encrypt it then decrypt to make sure everything matches. I want to also have some static files encrypted and uploaded for testing to make sure that they are always decrypt-able. I also might look in to adding some way to speed up/bypass the key generation so each test case isn't taking 1-2s.

The next thing will be to figure out a good way to be able to create or fix issues while maintaining backward compatibility so people will always be able to get their files back. My current idea is to create some mapping of int32 to decryption method then prepend that to the file. So then when an old file is downloaded we can look up how to decrypt it.

@sselph thank you thank you. That's great news!

I want to also have some static files encrypted and uploaded for testing to make sure that they are always decrypt-able. I also might look in to adding some way to speed up/bypass the key generation so each test case isn't taking 1-2s.

Please let me know if you'd like files of varying sizes, and I can make a collection of these for you.

Aha great point about maintaining backward compatibility, sure we can do that. In the worst case, we can have an identifier say const EncryptorVersion = "0.0.1" baked into the binary and if things go wrong we can just look up that value in their binary and point them to a version of drive that matches that encryptor.

Please let me know how I could be of help in making this possible.

I think I can send a PR tonight or tomorrow. Is there a branch you'd like me to send it to or just send it to master? It still needs some work but figured it would be best to get it uploaded. This isn't integrated so shouldn't affect the stability of the code but thought you might want to have it somewhere else until it is complete and tested.

Sounds gucci. For sure, a PR to master would be alright. We can iterate on it, no worries.

I wrote up a a quick wiki page on how the library works. I thought it might help if someone wants to check the implementation for errors but can't read Go.

https://github.com/odeke-em/drive/wiki/End-to-End-Encryption

Is there any interest in adding an option for public key encryption (rather than password)? Or possibly an option that simply pipes out/in some command to encrypt/decrypt, like gpg?

Happy to open a separate issue if that seems sensible.

I think the piping is already supported using drive push -piped and drive pull -piped something like this as an example https://github.com/odeke-em/drive/wiki/%5BUse-Case%5D-Backup,-archive,-encode-and-push-directory.#above-encrypted-with-openssl-aes-256

Hello there @shazow, thanks for the request and welcome to drive!

Yap, like @sselph mentioned in #543 (comment), we support piping for both push and pull. However, Go's x/crypto has OpenPGP support https://godoc.org/golang.org/x/crypto/openpgp so if interested please feel free.
@shazow please go ahead and open a separate issue and we can discuss on that.
Also if am not mistaken, we'll need to work with dcrypto to serialize the version but also what kind of encryption was used. In order to do that though, we've still gotta wait for Google crypto and OSS confirmations with @sselph.