axiak/pybloomfiltermmap

How do I load a bloomfilter file?

Closed this issue · 13 comments

Thanks for the library,

Let's take this example :

words = pybloomfilter.BloomFilter(100000, 0.1, '/tmp/words.bloom')

Where words.bloom contains an older bloomfilter that I want to load and work with.

How do I do that?

axiak commented

If the file already exists, that constructor will reuse the existing bloom filter (as long as the parameters match, otherwise it will throw).

As long as the parameters match

By that you mean I have to supply the same error rate and capacity that were used to create that file in the first time?

axiak commented

Right.

screenshot from 2016-06-17 01 36 32
Is there a test I can run for that?

since my manual tests in python3 don't seem to work.

axiak commented

What does your test look like?

axiak commented

(Did you make sure to close the old bloom filter before checking?)

Close? I think I need instruction on that.

(I'd also appreciate it if you'd add such instructions to the README.md file)

Is this the right implementation? Because it does not seem to work.

screenshot from 2016-06-17 03 08 08

When I open the file manually, I see that there have been many bits that changed value from 0000 to something else, so the file is actually being saved correctly, it is the loading that is messed up I suppose.

Example:

4d42 4954 4152 5241 5980 0000 0000 0000
00b0 0400 0005 0000 0000 0000 009a 9999
9999 99b9 3f03 0000 001c 8656 ade3 a172
6fe1 9477 eb00 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000

I tried another way by initializing a new filter on a new empty file, and performed copy.
The new file has gotten the same values as the old file when opened raw, however in the bloom filter, it still returns False on existing values...

axiak commented

does it work without saving to disk? I haven't tested on python3 at all so it's entirely possible it's a string encoding issue

I tried copying before checking but it seems to return False on existing values.

I suspect it's an encoding thing as you said.

If so, I think reopening the issue would be a valid thing to do.

@jshrt Do you solved the problem now?