bits-and-blooms/bloom

Applying the bloom filter for HASH160 addresses

oguzdelioglu opened this issue · 11 comments

Example Address List.

0af80f98e511577e5c87d56069896c70fe4a6263
0af81021340c23bb20429e47c3bdd62137c322d3
0af811994346cdca9d6dbe8e7da4f3393130fc71
0af81250856d377cf7f6914ebe196a9570ada313
0af814b46c4ef5c57dbc53ae03c812eea37d4aac
0af816bb57c0adac8bca0401d295724cd80b0956
64dfb83dcca1f516d2bac93c8c9e4eb6513fd364
64dfb8c23802f3aa67c9078c0e7970fa17584892
64dfb8c5d21743ba3dd8cef5b576e1ddf6d4c239
64dfb931e9d68aa68842efa3c5d0bc3c1101b5ec
64dfba34a39823c273a33fadf49112c063ec3724

It gives quite a lot of positive and false results and for this reason I cannot use it.
I want to use bloom filter to filter about 25 million hash160 addresses.
How can I use the settings more efficiently?

I want to use it for this project.
https://github.com/oguzdelioglu/odelbrain

Can you please run the following code...

    filter := bloom.NewWithEstimates(25000000, 0.01) 
    filter.Add([]byte("0af80f98e511577e5c87d56069896c70fe4a6263"))
    filter.Add([]byte("0af81021340c23bb20429e47c3bdd62137c322d3"))
    filter.Add([]byte("0af811994346cdca9d6dbe8e7da4f3393130fc71"))
....

You should get very close to a 1% false-positive rate. If you do not, please report back on what you do get.

I am going to close this. Please run the suggested code above or, at least, provide a full code sample so we can verify the false-positive issue.

Note that when creating the Bloom filter, you need to know both the capacity and the desired false-positive rate. This is not a limitation of this library, it is how the Bloom filter works.

If you keep adding entries to the same Bloom filter beyond the initial capacity, the false-positive rate will go up.

boom.NewBloomFilter(uint(addressCount), 0.01)

addressCount is total wallets (Count:24605923)

When I do 0.01 I get an exorbitantly high number of false positives.

I generated random wallets = 5229707
Found false positives = 53393

@oguzdelioglu If you find that bloom.NewWithEstimates does not work, please provide fully reproducing code sample. Create a new issue.

@oguzdelioglu If you find that bloom.NewWithEstimates does not work, please provide fully reproducing code sample. Create a new issue.

https://github.com/oguzdelioglu/odelbrain/blob/master/main.go

Total Wallet: 24605923
Elapsed Time = 1m0.3868661s
Generated Wallet = 6867766
Generate Speed Avg(s) = 114462
Found = 69115

It creates 70,000 false positives in approximately 7 million inquiries.

I using now false positive 0.0000001 and not bad now.But its maybe more beautiful

@oguzdelioglu Your code is a bit complicated. Can you trim it to only as few lines as possible. There should be no need for goroutines, for example.

@oguzdelioglu Your code is a bit complicated. Can you trim it to only as few lines as possible. There should be no need for goroutines, for example.

I use goroutine to do faster brute-forcing with more cores.

Doesn't it work well with Bloomfilter goroutine?

I first define it as follows, then I add the addresses, then I check.

//Settings
sbf = boom.NewWithEstimates(uint(addressCount), 0.0000001)

//Insert Address to bloom
for _, address := range addressList {
		sbf.Add([]byte(address))
}

//Check Wallet is exist in bloom
if sbf.Test([]byte(randomWallet.base58BitcoinAddress)) {
	SaveWallet(randomWallet)
	totalFound++
}

Please run the following and tell me what float64(count)/1000.0 is at end of the routine.

	f := NewWithEstimates(1000, 0.001)
	for i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i)
		f.Add(n)
	}
	count := 0

	for  i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i + 1000)
		if f.Test(n) {
			count += 1
		}
	}

This adds the keys from 0 to 1000 and the it checks that the keys from 1000 to 2000 are in the set. It is a measure of the fpp.

If you believe that there is a bug, then you need to present such a simple example.

I cannot go through your own code.

The first expectation you should have is that your code is at fault. This library is used by hundreds of people for years. So if you think that there is a bug in the library (and not in your code), you have to come up with a carefully crafted reproducible test case. Do not submit a large blog of complicated code. If you do, then my assumption is that the bug is in your code.

Please run the following and tell me what float64(count)/1000.0 is at end of the routine.

	f := NewWithEstimates(1000, 0.001)
	for i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i)
		f.Add(n)
	}
	count := 0

	for  i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i + 1000)
		if f.Test(n) {
			count += 1
		}
	}

This adds the keys from 0 to 1000 and the it checks that the keys from 1000 to 2000 are in the set. It is a measure of the fpp.

If you believe that there is a bug, then you need to present such a simple example.

I cannot go through your own code.

The first expectation you should have is that your code is at fault. This library is used by hundreds of people for years. So if you think that there is a bug in the library (and not in your code), you have to come up with a carefully crafted reproducible test case. Do not submit a large blog of complicated code. If you do, then my assumption is that the bug is in your code.

10 billion test 2000 false positive now.
I set fp rate = 0.0000001

I was able to minimize false positives with this setting.

I don't think we can reduce it even more :)

I expect that you might be misusing the library. The fp rate that you specify should match very closely the fp that you measure empirically. If not, please build a simple example as I suggested above to demonstrate the issue.

The fp rate that you specify is unlikely to be practical. It is very low.