ramboDB combines leveldb with rambo.
RAMBO Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)
RAMBO is a method to reduce the query cost of sequence search over the archive of dataset files to address the sheer scale and explosive increase of new sequence files. It solves achives sublinear query time (O(\sqrt{K} log K)) in number of files with memory requirement of slightly more then the information theoretical limit.
This code is the implementation of: https://dl.acm.org/doi/10.1145/3448016.3457333 for gene sequence search.
If you use RAMBO in an academic context or for any publication, please cite our paper:
@inproceedings{10.1145/3448016.3457333,
author = {Gupta, Gaurav and Yan, Minghao and Coleman, Benjamin and Kille, Bryce and Elworth, R. A. Leo and Medini, Tharun and Treangen, Todd and Shrivastava, Anshumali},
title = {Fast Processing and Querying of 170TB of Genomics Data via a Repeated And Merged BloOm Filter (RAMBO)},
year = {2021},
isbn = {9781450383431},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3448016.3457333},
doi = {10.1145/3448016.3457333},
pages = {2226–2234},
numpages = {9},
keywords = {information retrieval, bloom filter, genomic sequence search},
location = {Virtual Event, China},
series = {SIGMOD/PODS '21}
}
before run:
-
install boost
(Ubuntu)
sudo apt-get install libboost-all-dev
-
install tbb
(Ubuntu)
Intel tbb is used for parallel option.
git clone https://github.com/wjakob/tbb.git
cd tbb
mkdir build
cd build
cmake ..
make -j
sudo make install
-
Set parameters and run code: start,end in line 40-52 of src/main.cpp (start means the start blocknumber and end means the end blocknumber) m, B and R in line 30-32 of src/main.cpp (three parameter for bloomfilter)
data filename in line 45 of src/main.cpp ,The format of test file is like test_mofified.txt
run rambo test:
mkdir build
cd build
cmake ..
make
./rambo_test answer.txt 0
Argv[1] is the file name to record the performance of rambo.
Argv[2] is 0 or 1,deciding how to save the search result. (0 for bitset,1 for std::set)