Kronuz/Xapiand

Data Encryption

Thibaud-DT opened this issue · 3 comments

Hi !

I need to index secure data, I would like to encrypt all data stores by Xapiand.

Do you have this enhancement in mind, or do you have some ideas how doing that?

By reviewing the source code, I found in that data is stored on the file system in a text file, same thing for the indexed data.

So, we can encrypt the file system and decrypt it when we need to read or write something or we encrypt only the data and indexed data in the text file.

For the first option I think Partition Encryption, but I've not put enough thought yet.
For the second option, I found saltpack (https://saltpack.org/). It uses a asymmetric encryption system, which it not the most adapted solution. We can use AES, but we have to think about the key.

So, here I am. I tested Xapiand and it's work really great. The stemming (which Xapiand not do) it's not perfect, especially for the French content, but, yeah, French it's hard.. !

Hope you can help me !

Regarding secure data. Xapiand stores indexes in different binary files. Each of which is generally the representation of an inverted index in a B-tree; so, even if the body of your data could be encrypted, the index terms would remain unencrypted. In my opinion, it would be better to encrypt the whole volume.

Regarding stemming, you need to specify the language of the text field, when indexing a document in an index for the first time (or specify a custom scheme for a given index). See the first example here: https://kronuz.io/Xapiand/docs/reference-guide/schemas/field-types/text-type/ only instead you’d need to specify “_language”: “fr” instead. That’d use the French version of the snowball stemmer for stemming.

I will look at this solution, see what I can do to minimize the latence impact. I will let you know.

No problem with the stemming, it's work well. I was tricked by the schema specification. I post a first element with no _language specify, and when I tried to add a new element with a _language, Xapiand returned some error saying that I can't change the "_language" value of an element. I just update the schema, and that resolve the issue.

The only problem with the French stemming implement by snowball is accents. If the word "poème" is indexed, if I search for "poeme", Xapiand will not find it. I will remove all accent i think to resolve that.

Can we indexed some content without store the content ?
For example, I store a Tweet PUT /tweet/1 with in data a "content" key with the value : "Samuel is now in live".
So in the data store, I will have the tweet ID, and the indexed content, but not the content.
It's very specify, and yes, the content can be retreive by looking to the indexed content, but it's less trivial to do. And that limits the amount of data to encrypt.