MatrixAI/js-encryptedfs

File Integrity - Merkle Tree for Encrypted Chunks

Closed this issue · 8 comments

There is no restriction for only using a single key for encryption with EFS. It is possible with each new instantiation of EFS a different key. It would be beneficial to ensure that when an encrypted file is opened, that the key loaded in the EFS is the same as the key that the file was encrypted with.

This helps preserves the integrity of the file by not performing any operation unless the key is verified to be the same. It can be argued that using AES GCM already provides integrity, but there is a difference albiet subtle.

This key validation measure is designed to prevent the user from destroying the integrity of the file. For example there is nothing stopping the using from writing a block anywhere in an existing file using a separate key than the one it was written with, even when using AES GCM. Key validation on open() can prevent this. The most common use case here would not involve a malicious attacker but rather user error

The integrity provided by GCM is designed to alert the user if data modification can occurred. This would have occurred outside the EFS universe, and generally implies that the file has been maliciously tampered or corrupt, so the data is not be trusted.

One way to perform the key validation is to hash the key, and put it the metadata header of a file when it is open(*, w* )'d. On any open(*, !w*) of a file, the key the EFS is loaded with will be hashed and compared to the one in the header of the file.

@ll-aashwin-ll We had a discussion about this yesterday. Does that mean this is already done?

Note that I believe this only works at the chunk-level. So a user would be allowed to modify one of the uncorrupted chunks before they meet the wrong chunk. I feel like this could be problematic. Maybe we could instead maintain a total hash calculated over all chunks.

Please see if we can make use of https://en.wikipedia.org/wiki/Merkle_tree to verify a full file without having to hash every chunk up of a large file?

Can you change this issue title to be about file integrity?

Looking at keybase's merkle tree. In order to get a on-disk merkle tree, we have to supply it an abstract object that supports key value storage. These keys look like they are the merkle tree keys, and they themselves will refer to other merkle tree keys, thus creating a tree. However on-disk structures have to refer to each other by disk-addresses. So to provide this key-value interface, we need to then use a on-disk key-value data structure. We could write our own btree to do this, but I think we can also make use of many key-value databases that exist. The only problem is that we need this database to be embedded and lightweight. Perhaps so lightweight that it could be a block of data that is tagged along with the encrypted file. Perhaps a side-car file for every file that is stored inside a vault?

Examples of possible embedded key-value on-disk structures:

A question would be that if we do use a side-car file, then this file integrity data is also encrypted-fs metadata. And this metadata relates to the other metadata mentioned in #8. Again this would mean plaintext file content and plaintext posix attribute should be cleanly separated from encrypted-fs metadata including these file-integrity aspects. It may then become better to keep all encrypted-fs metadata separate.

Consider this low priority for now, as should focus on the js-polykey and Polykey issues.

To clarify:

This library deals with chunk-integrity already. But this issue is addressing file-integrity.

A related issue is vault integrity. We don't know how this can be achieved yet.

Also a question is about whether integrity is maintained on the plaintext or ciphertext. If we the merkletree is maintained over the ciphertext, different keynodes sharing the same vault would be using different vault keys, so therefore the merkle trees would not shareable, and limited to within 1 keynode. Alternatively and I think this is where we are going to go with is to use the merkle tree on only the plaintext. Which allows us to share the merkletree along with sharing the vault.

Closing on account of migration to gitlab