/fingerprint

Document fingerprint generator

Primary LanguagePythonMIT LicenseMIT

Fingerprint -- Document Fingerprint Generator

Fingerprint of a document

Fingerprint is a signature of the document. In particular, it is a representative subset of hash values from the set of all hash values of a document. For more detail, please consider taking a look at Winnowing: Local Algorithms for Document Fingerprinting (specifically Figure 2).

Super simple to use

Fingerprint is very simple to use.

f = Fingerprint(kgram_len=4, window_len=5, base=10, modulo=1000)
print f.generate(str="adorunrunrunadorunrun")
print f.generate(fpath="/Users/test/docs/CHANGES.txt")

The default values for the parameters are

kgram_len = 50
window_len = 100
base = 101
modulo = sys.maxint

Install

pip install fingerprint