An empty Bloom filter is a bit array of m bits, all set to 0. There must also be hashNumber different hash functions defined, each of which maps or hashes some set element to one of the m array positions, generating a uniform random distribution. Typically, hashNumber is a constant, much smaller than m, which is proportional to the number of elements to be added; the precise choice of hashNumber and the constant of proportionality of m are determined by the intended false positive probability of the filter.
If you are not familiar with it, you can refer to Wikipedia's detailed analysis.
If you use python, you can refer to the bloomfilter written in python
This project implements bloomfilter and can select and extend different BitSets or other storage methods. For example, BitSet supports Java
and Redis
. It can also be applied to scenarios where various data levels and different false positive probability are required.
Release Log:
bloomfilter version 1.0.0 Released
Download source code and binary jar package on the release page.
This is not necessary now. The dependency uploaded to the maven central repository.
git clone https://github.com/wxisme/bloomfilter.git
cd bloomfilter
mvn clean install
You can also download source code or binary jar package on the release page directly.
<dependency>
<groupId>com.github.wxisme</groupId>
<artifactId>bloomfilter</artifactId>
<version>1.0.0</version>
</dependency>
public class JavaBitSetTest {
public static void main(String[] args) {
//(falsePositiveProbability, expectedNumberOfElements)
BloomFilter<String> filter = new BloomFilter<String>(0.0001, 10000);
filter.bind(new JavaBitSet());
filter.add("filter");
System.out.println(filter.contains("filter"));
System.out.println(filter.contains("bloom"));
filter.add("bitset");
filter.add("redis");
System.out.println(filter.contains("bitset"));
System.out.println(filter.contains("redis"));
System.out.println(filter.contains("mysql"));
System.out.println(filter.contains("linux"));
System.out.println(filter.count());
System.out.println(filter.isEmpty());
filter.clear();
System.out.println(filter.isEmpty());
System.out.println(filter.contains("filter"));
}
}
Expected test result is :
true
false
true
true
false
false
3
false
true
false
You can easily use bloomfilter on redis,here is a simple example:
public class RedisBitSetTest {
public static void main(String[] args) {
//Don't forget auth password, you better use the configured redis client connection.
//It should be noted that bloomfilter is not responsible for closing and returning redis connection resources.
//(falsePositiveProbability, expectedNumberOfElements)
BloomFilter<String> filter = new BloomFilter<String>(0.0001, 10000);
Jedis jedis = new Jedis("127.0.0.1", 6379);
jedis.auth("1234");
filter.bind(new RedisBitSet(jedis, "bloomfilter:key:name"));
//if you have a redis cluster
//Set<HostAndPort> nodes = new HashSet<>();
//nodes.add(new HostAndPort("127.0.0.1", 6379));
//filter.bind(new RedisBitSet(new JedisCluster(nodes), "bloomfilter:key:name"));
//you can also use jedispool
//JedisPool jedisPool = new JedisPool("127.0.0.1", 6379);
//Jedis jedis = jedisPool.getResource();
//filter.bind(new RedisBitSet(jedis, "bloomfilter:key:name"));
filter.add("filter");
System.out.println(filter.contains("filter"));
System.out.println(filter.contains("bloom"));
filter.add("bitset");
filter.add("redis");
System.out.println(filter.contains("bitset"));
System.out.println(filter.contains("redis"));
System.out.println(filter.contains("mysql"));
System.out.println(filter.contains("linux"));
System.out.println(filter.count());
System.out.println(filter.isEmpty());
filter.clear();
System.out.println(filter.isEmpty());
System.out.println(filter.contains("filter"));
}
}
Don't forget auth password, you better use the configured redis client connection. It should be noted that bloomfilter is not responsible for closing and returning redis connection resources.
If you are not familiar with redis, you can refer to the following hyperlink:
https://redis.io
https://github.com/xetorthio/jedis
Test result is :
true
false
true
true
false
false
3
false
true
false
If you want to use your own data structure, you can directly implement the BaseBitSet
interface.
Example:
public class YourBitSet implements BaseBitSet {
private int[] data;//boolean array
public YourBitSet(int size) {
data = new int[size];
}
@Override
public void set(int bitIndex) {
data[bitIndex] = 1;
}
@Override
public void set(int bitIndex, boolean value) {
if (value)
data[bitIndex] = 1;
else data[bitIndex] = 0;
}
@Override
public boolean get(int bitIndex) {
return data[bitIndex] == 1;
}
@Override
public void clear(int bitIndex) {
data[bitIndex] = 0;
}
@Override
public void clear() {
Arrays.fill(data, 0);
}
@Override
public long size() {
long size = 0;
for (int d : data)
if (d == 1)
size++;
return size;
}
@Override
public boolean isEmpty() {
return size() <= 0;
}
}
Test:
public class YourBitSetTest {
public static void main(String[] args) {
//(falsePositiveProbability, expectedNumberOfElements)
BloomFilter<String> filter = new BloomFilter<String>(0.0001, 10000);
filter.bind(new YourBitSet(1000000));
filter.add("filter");
System.out.println(filter.contains("filter"));
System.out.println(filter.contains("bloom"));
filter.add("bitset");
filter.add("redis");
System.out.println(filter.contains("bitset"));
System.out.println(filter.contains("redis"));
System.out.println(filter.contains("mysql"));
System.out.println(filter.contains("linux"));
System.out.println(filter.count());
System.out.println(filter.isEmpty());
filter.clear();
System.out.println(filter.isEmpty());
System.out.println(filter.contains("filter"));
}
}
Test result is :
true
false
true
true
false
false
3
false
true
false
Bloomfilter is released under the GNU Lesser General Public License v3.0.
You can just use maven to import dependency package or use your own compiled jar package.
You may copy this code directly into your project if you leave the LGPL-comment in place and reference the following hyperlink:
https://github.com/MagnusS/Java-BloomFilter
https://github.com/wxisme/bloomfilter