Wrong result with large filter
mrqc opened this issue · 7 comments
I expect that if I ask the filter to check for a membership and it tells me FALSE, then its definitely NOT a member. I did the following:
def verifyMembership(key):
global bloom
if key in bloom:
print('Its possibly in')
else:
print('Definitly not in')
key = 'some'
filterFile = 'index.dat'
bloom = BloomFilter(est_elements=100000000, false_positive_rate=0.03, filepath=filterFile)
verifyMembership(key)
bloom.add(key)
verifyMembership(key)
bloom.export(filterFile)
I called my script twice and the output is:
Definitly not in
Its possibly in
Definitly not in
Its possibly in
But I would expect:
Definitly not in
Its possibly in
Its possibly in
Its possibly in
If i am reducing the est_elements to lets say 10000, then its fine.
So running your script with a few changes (mostly not using the global variable) I got the results you were expecting. The code I used:
def verifyMembership(blm, key):
if key in blm:
print('Its possibly in')
else:
print('Definitly not in')
key = 'some'
filterFile = 'index.dat'
blm = BloomFilter(est_elements=100000000, false_positive_rate=0.03, filepath=filterFile)
verifyMembership(blm, key)
blm.add(key)
verifyMembership(blm, key)
blm.export(filterFile)
# test loading it
blm2 = BloomFilter(est_elements=100000000, false_positive_rate=0.03, filepath=filterFile)
verifyMembership(blm2, key)
blm2.add(key)
verifyMembership(blm2, key)
blm2.export(filterFile)
So far, I am unable to replicate.
It could be something that was fixed in version 0.4.1 which I haven't pushed yet. I will cut that release and hopefully that will fix your issue. You would need to update your version of pyprobables.
Maybe thats the reason. Hopefully in 0.4.1 its fixed. But one question: If you run this (adapted) script twice:
def verifyMembership(key, bloomFilter):
if key in bloomFilter:
print('Its possibly in')
else:
print('Definitly not in')
key = 'some'
filterFile = 'index.dat'
bloomFilter = BloomFilter(est_elements=100000000, false_positive_rate=0.03, filepath=filterFile)
verifyMembership(key, bloomFilter)
bloomFilter.add(key)
verifyMembership(key, bloomFilter)
bloomFilter.export(filterFile)
...you get the expected result, right? If so, then I am fine and looking forward to 0.4.1. Bcz for me, the run looks like this (very strange to me):
$ rm index.dat
$ python3 parse.py
Definitly not in
Its possibly in
$ python3 parse.py
Definitly not in
Its possibly in
$ python3 parse.py
Its possibly in
Its possibly in
$
Where parse.py is the code provided.
So in my version of the script it ran both back to back so it only had to run once. When I ran your version twice, I didn't see the issue either. Version 0.4.1 has been pushed and hopefully fixes what you are seeing.
As for your other reply, I am not sure I understand what you are referencing about parse.py
Thanks. Appreciate that! parse.py is simply the filename of my python script. ;)
Yea, that fixed it! Thanks!