This is a native go implementation of spamsum.
spamsum was developed by Andrew Tridgell to hash email messages for computationally inexpensive SPAM detection. See http://junkcode.samba.org/#spamsum.
- Ready for production use.
- It seems to generate results identical to that of the spamsum tool and ssdeep. This has only been tested on a small number of files.
- It is about twice as slow as the spamsum tool; about 40MB/s on a 3Ghz Core i3. Use
gccgo
to make the speed difference disappear. - Fuzzy comparison may be slower than the spamsum tool. Benchmark forthcoming.
Unfortunately, the default operation for spamsum is to iterate over the data several times to determine an optimal block size, so it's not sensible to implement the hash.Hash
interface.
Instead, the package exports the functions HashBytes(b [] byte)
and HashReadSeeker(source io.ReadSeeker, length int64)
.
if file, err := os.Open("filename"); err != nil {
log.Fatal(err)
} else if stat, err := file.Stat(); err != nil {
log.Fatal(err)
} else {
sum, err := spamsum.HashReadSeeker(file, stat.Size())
// etc.
}
Any errors returned by HashReadSeeker
will originate from the io.ReadSeeker
functions.
If it is acceptable to set a fixed blocksize beforehand, the SpamSumWriter
type can be used, which does implement the hash.Hash
interface. The Sum(b []byte) []byte
method is not terribly useful; it will return a slice where the non-zero bytes contain a base64-encoded 6-bit hash for a BlockSize()
-sized block. Use the String()
method to obtain a more useful representation.
Use of this code is governed by version 2.0 or later of the Apache License, available at http://www.apache.org/licenses/LICENSE-2.0
The extra_tests
directory contains modified sources of the original spamsum tool as part of a regression test suite. This code is joint licensed under the Perl Artistic License and the GPL version 2 or later.