axiak/pybloomfiltermmap

Memory error on BloomFilter instanciation

Closed this issue · 2 comments

Instantiating a BloomFilter object as the tutorial says so, I get a MemoryError

>>> from pybloomfilter import BloomFilter
>>> fruit = BloomFilter(100000, 0.1, '/tmp/words.bloom')
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-2-b436a046df8e> in <module>()
----> 1 fruit = BloomFilter(100000, 0.1, '/tmp/words.bloom')

/home/oleiade/.python-eggs/pybloomfiltermmap-0.3.8-py2.7-linux-x86_64.egg-tmp/pybloomfilter.so in pybloomfilter.BloomFilter.__cinit__ (src/pybloomfilter.c:2347)()

MemoryError: 

Using git bisect, I found out that the bug seems to be introduced by the commit be40e8c

Starting from master branch as bad

$ git bisect start
$ git bisect bad
$ git bisect good 93621b8b9f365224d7d2948d6c5060c2b576db0c
be40e8cfc19f74e900ade94f220fddb247e9efbd is the first bad commit
commit be40e8cfc19f74e900ade94f220fddb247e9efbd
Author: Mike Axiak <maxiak@hubspot.com>
Date:   Mon Sep 24 23:41:48 2012 -0400

    Might have fixed crypto lib issue? Refs #22

:100644 100644 2da822fa61db7cbb89510a8bbbf5fcc02cf00d6f 02dbf8a3104e01a6578a6ff641e4dd64ec92a662 M  CHANGELOG
:100644 100644 12eb6769d3ea02c11726eeab5edbf284f736cf36 e51fa1d55b29e411529141345bf1c1ce9d28e950 M  setup.py
:040000 040000 82cc0956f7fddeca724e8d4dbd2305650e5cb1c1 778ad9afef070beba0afabfd5d8b3de5b860e1fd M  src

hope it helps.

Nota : bug seems to be confirmed by travis

The bug is inserted by : be40e8c#L3R114

@@ -111,7 +111,7 @@ MBArray * mbarray_Create_Mmap(BTYPE num_bits, const char * file, const char * he
     array->size = (size_t)ceil((double)num_bits / sizeof(DTYPE) / 8.0);
     array->bytes = (size_t)ceil((double)num_bits / 

-    if (filesize < 0) {
+    if (filesize <= 0) {
         mbarray_Destroy(array);
         return NULL;
     }

The bug seems to be implied by the filesize <= 0, if you set it back to filesize < 0, the MemoryError disapears :)

Tests pass, except tests.accuracytest.AccuracyTestCase.test_strings

======================================================================
FAIL: test_strings (tests.accuracytest.AccuracyTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/oleiade/Dev/Sandbox/Python/pybloomfiltermmap/tests/accuracytest.py", line 28, in test_strings
    self.assertTrue(false_pos_rate <= accuracy*2, "accuracy fail: %0.4f > %0.4f" % (false_pos_rate, accuracy))
AssertionError: accuracy fail: 0.0030 > 0.0010

----------------------------------------------------------------------
Ran 8 tests in 0.704s

PS: I prefer not to fork-and-pull-request as I don't know the code enough to check if it breaks something or not

Just wanted to confirm this: I'm getting the same error as oleiade. I previously used pybloomfilter with no problem, but recent versions raise a MemoryError.