/p5-Digest-bBitMinHash

Perl implementation of b-Bit Minwise Hashing algorithm

Primary LanguagePerlOtherNOASSERTION

NAME

Digest::bBitMinHash - Perl implementation of b-Bit Minwise Hashing algorithm

SYNOPSIS

use Digest::bBitMinHash;

my $bbmh = Digest::bBitMinHash->new({"b"=>2, "k"=>128,});
# Or my $bbmh = Digest::bBitMinHash->new(2, 128);

my @data1 = split / /, "Giants Nakai left-knee ligaments injury entry wiped";
my @data2 = split / /, "Nakai left-knee entry wiped Tigers right-shoulder Osaka";

my $vectors1 = $bbmh->get_bit_vectors(\@data1);
my $vectors2 = $bbmh->get_bit_vectors(\@data2);
# Or $vectors1 = $bbmh->get(\@data1);

my $match_bit_count = $bbmh->compare_bit_vectors($vectors1, $vectors2);
# Or $match_bit_count = $bbmh->compare($vectors1, $vectors2);

my $score = $bbmh->estimate_resemblance(\@data1, \@data2, $match_bit_count);

# $score is under 0.8. So @data1 and @data2 are not similar.
#
# And actually, you can estimate only using estimate() with the two elements arrays.
#
# my $bbmh = Digest::bBitMinHash->new({"b"=>2, "k"=>128,});
# my $score = $bbmh->estimate(\@data1, \@data2);

DESCRIPTION

Digest::bBitMinHash is the Perl implementation of b-Bit Minwise Hashing algorithm.

LICENSE

Copyright (C) 2013 by Toshinori Sato (@overlast).

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

Toshinori Sato (@overlast) overlasting@gmail.com