Callidon/bloom-filters

Serialized filters: are safe to expose publicly?

zxpectre opened this issue · 4 comments

Hi, im working on a blockchain, on-chain scallable filtering solution, and I wish to know if an attacker could get advantages if the entire serialized filter is stored publickly.

If thats the case, are there any suggestions to follow, like keeping some parts of the filter off-chain maybe? filter should be exposed somehow to decentralize the filtering.

The error rate should stay the lowest and filter content size can vary.

I'm analyzing to use this nice library.

Btw: tryed error rate to zero just for testing and browser cpu usage got crazy, I know it was an utopic case but just notifying if it's usefull for you guys,

tryed error rate to zero just for testing and browser cpu usage got crazy,

Yes, most filters use the error rate to initialize their size. So an error rate of 0 means most-likely an infinite size due to Math.log(0) = -Infinity

are safe to expose publicly?

IMHO, exposing completely the filter might not be a good idea because the seed is exposed. But if you keep the seed hidden and customize the hash functions to be different of what is used in this package (because it is open-source) you may succeed to keep your data safe to be exposed publicly.
Keeping the seed hidden means for an attacker that he has to use some kind of brut-force attacks to get the seed. Even if he knows the data that can be inserted in the filter, pre-hashing the data before insert will almost ensure that the attacker won't be able to reverse engineer both your pre-hash function and the seed used by the filter.

I'm closing the issue since it is not a bug. Feel free to continue the discussion.

Thanks for the detailed response, will follow your suggestions, it's a pitty because would be nice to count on a publicly auditable bloom filter based protocol for this but is mainly a check on a sorted set of transaction hashes and merkle trees somehow are not fitting well due to proof sizes and the requirement of counting with the entire tree in order to generate proofs. Bloom filters seems a good fit i guess.

If you have problems of comparing sets I suggest you to read the Invertible Bloom Filter (or invertible bloom lookup tables; IBLTs) part of this project. You'll find O(d) space comparisons. It is very useful if you have append-only sets. Thus it is very space efficient, e.g.: if you have synchronization between remote sites and have for example 10 updates before each synchronization then, you can have a structure of at most 15 cells (alpha=1.5). So if you store hashes of 32 bytes you'll have 32 * 10 * 1.5 bytes in your structure.
But if you need to count element in the set then a counting bloom filter might be what you are looking for.