/codable-bloom-filter

An implementation of the Bloom filter data structure conforming to Swift's Codable serialization protocol

Primary LanguageSwiftMIT LicenseMIT

CodableBloomFilter

An implementation of the Bloom filter data structure conforming to Swift's Codable serialization protocol.

Installation

Swift Package Manager

Add a dependency to https://github.com/metabolist/codable-bloom-filter.git, either in Xcode or by adding it to the dependencies array in your Package.swift:

dependencies: [
    .package(name: "CodableBloomFilter", url: "https://github.com/metabolist/codable-bloom-filter.git", .upToNextMajor(from: "1.0.0"))
],

CocoaPods

Add the following to your Podfile:

pod 'CodableBloomFilter', '~> 1.0'

Usage

For a type to be usable in a bloom filter, it must conform to the DeterministicallyHashable protocol. An implemenation for Data and String is included.

Bloom filters are initialized with a set of hash functions and a count of bytes to use. CodableBloomFilter provides 5 different hash functions (32-bit versions of DJB2, DJB2a, SDBM, FNV1, and FNV1a). For determining the number of hash functions and size for your use case, see https://hur.st/bloomfilter/ or https://en.wikipedia.org/wiki/Bloom_filter#Optimal_number_of_hash_functions.

Note that while a size of 8 bytes is used in the following examples to keep the output readable, in practice you will likely want something larger.

A bloom filter can be used similarly to a Set:

import CodableBloomFilter

var bloomFilter = BloomFilter<String>(hashes: [.sdbm32, .djb232], byteCount: 8)

bloomFilter.insert("string")

bloomFilter.contains("string") // returns true
bloomFilter.contains("other string") // returns false

To serialize a bloom filter:

let data = try JSONEncoder().encode(bloomFilter)
// `data` will be JSON data equivalent to "{"hashes":["djb232","sdbm32"],"data":"AAAAAAAAAhA="}"

To deserialize a bloom filter:

let bloomFilter = try JSONDecoder().decode(BloomFilter<String>.self, from: data)

Using a custom dataEncodingStrategy / dataDecodingStrategy is supported. For example, to have the data serialized as a hexadecimal string:

let encoder = JSONEncoder()

encoder.dataEncodingStrategy = .custom { data, encoder in
    var container = encoder.singleValueContainer()

    try container.encode(data.map { String(format: "%02.2hhx", $0) }.joined())
}

let data = try encoder.encode(bloomFilter)
// `data` will be JSON data equivalent to "{"hashes":["djb232","sdbm32"],"data":"0000000000000210"}"

Bloom filters can also be initialized from Data. To initialize the bloom filter above:

let bloomFilter = BloomFilter<String>(hashes: [.sdbm32, .djb232], data: Data([0, 0, 0, 0, 0, 0, 2, 16]))

A bloom filter's data can be accessed via its data property:

let data = bloomFilter.data // `data` will be equivalent to `Data([0, 0, 0, 0, 0, 0, 2, 16])`

License

CodableBloomFilter is released under the MIT license. See LICENSE for details.