This repository contains two branches:
- The main, which contains the original
HBFT-SD
implmentation. - The
NCF-HBFT-SD
implementation, which is removes common blocks.
This tool is an adaptation of MRSH-HBFT
, a similarity search tool based on MRSH-V2
. This implementation changes the feature extraction tool, which originally is MRSH-V2
, to sdhash
, aiming to improve its performance.
- D. Lillis, F. Breitinger, M. Scanlon, Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees. In: Matoušek P., Schmiedecker M. (eds) Digital Forensics and Cyber Crime. ICDF2C 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineer- ing, vol 216. Springer, Cham, 2018 (doi: [10.1007/978-3-319-73697-6_11](https://dx.doi.org/10.1007/978-3-319-73697-6_11)).
- D. Lillis, F. Breitinger and M. Scanlon. Hierarchical Bloom Filter Trees for Approximate Matching. Journal of Digital Forensics, Security and Law (JDFSL), 13(1), 2018.
- Velho, J. P. B., Moia, V. H. G., and Henriques, M. A. A. (2020). Entendendo e melhorando a capacidade de detecçao de estrategias de busca de similaridade em investigações forenses. XX Brazilian Symposium on information and computational systems security, Brazilian Computer Society (SB).
- Boost
- OpenSSL (for sha1 hash)
- CMake
-
Extract the library folder onto
HBFT-SD
's main folder -
Inside boost folder Run:
$ ./bootstrap.sh --prefix = /usr/local/
$ ./b2 install
- The setup is complete!
- Makefile:
$ cmake .
$ make
- Run:
$ ./hbft_sd
There are four main parameters to configurate:
- The directory/file used to populate the tree:
File ./src/main.c
char *tree_dirname = "<directory_or_file_path>";
- The directory/file the user wants to search in the tree:
File ./src/main.c
char *search_dirname = "./src/helper.c";
- The number of leaves used on the tree
File ./src/main.c
int leaf_num = <number>;
- Minimal number of consecutive features to consider a match:
File ./src/main.c
File ./header/config.h
mode->min_run = <number>;
#define MIN_RUN <number>
- The Root Bloom Filter Size:
File ./header/config.h
#define BF_SIZE <number>
- The original implementation used a memory threshold given by the user to calculate the root Bloom filter size. This implementation uses the size configured on
BF_SIZE
(config.h
file), which is enable by the#define NEW_SIZE
. IfNEW_SIZE
is not enabled, the tool will use the memory threshold defined atmain.c
file withunsigned long mem_upper_limit
variable, however this funcionality was not tested onHBFT-SD
so be careful. - Originally,
MRSH-HBFT
, allowed multiple files inserted on a single leaf node,HBFT-SD
allows only one file per leaf node. This was implemented aiming to improve the tool's performance. - Enable
#define LOGGING
(config.h
) to print debbuging information when using this tool.