/mrsh-sd

A similarity search strategy based on mrsh-net and modified to take advantage of sdhash characteristics.

Primary LanguageC++

Warning

This repository was maintained due to historical reasons. To use the latest version of this tool, please access https://github.com/regras/net-sd.

mrsh-sd

A similarity search strategy based on mrsh-net and modified to take advantage of sdhash characteristics.

Instructions

Boost:

  1. Extract the library folder onto mrsh-sd's main folder
  2. Inside the boost folder run:
$ ./bootstrap.sh
  1. Run:
$ ./b2
  1. Run and the setup is complete!
$ ./b2 install

Compiling:

  1. Run:
$ make

Generating a bloom filter based on a list of files:

  1. List the files into a .txt file <list.txt>
  2. Run:
$./mrsh_net -d <list.txt> > digest

To compare a target list of files against a previously generated digest:

  1. List the target files into a .txt file <target.txt>
  2. Run:
$ ./mrsh_net -i digest <target.txt>

Additional configs

To modify the output format:

File ./src/main.c
Function: evaluation
Line 383 for the matches

 printf("%s:%d of %d(Longest run %d)\n", name, found, total,longest_run);

Line 387 for the non matches

printf("%s not found, min long run not long enough \n",name);

To modify the threshold of consecutive features

File ./header/config.h
Line 16 - min_run

#define MIN_RUN <number>

To change the Bloom Filter size

File ./header/config.h
Line 22
Obs.: The size must be power of two and expressed in bytes.