Selectable checksum algorithms and exposing the checksum in the output pattern

Question

Selectable checksum algorithms and exposing the checksum in the output pattern

dkasak opened this issue 7 years ago · 1 comments

It would be nice if the checksum algorithm radamsa uses internally for test case deduplication was selectable.

As aoh told me on IRC, radamsa uses a custom 96-bit checksum for the uniqueness filter (i.e. deduplication). At first it used SHA256 for this purpose but it was replaced with a simpler and leaner stream algorithm due to better performance and memory usage. However, in certain workflows, one may want truly unique files so it makes sense to spend a bit more resources to calculate a better quality hash (e.g. SHA256).

Additionally, if this is implemented, it would be nice if the checksum was exposed as an output pattern specifier (e.g. %h for hash, or whatever), since it would allow files generated by radamsa to be automatically deduplicated at the filesystem level. This integrates nicely into workflows that use the same strategy to ensure test case uniqueness in a corpus.

Answer 1 · 2017-12-17T08:39:18.000Z

First part done. Hash function can be selected with -H, and sha256 is converted internally to suitable 3-byte chunks for storing in hash tree, so the old checksum store can be used also for them. The checksum is currently shown in metadata, but not available for use in file name yet.

sol:~/src/radamsa$ ./bin/radamsa -H sha256 -M - -o out rad/*
seed: 828407099649152197717268
muta-num: 1, source: "rad/mutations.scm", generator: file, checksum: "a1b7541834709043aa7355dc89d8c6fdfd4115a18f64f5728b27aa63782b731a", nth: 1, path: "out", output: file-writer, length: 48645, pattern: once-dec
sol:~/src/radamsa$ sha256sum out
a1b7541834709043aa7355dc89d8c6fdfd4115a18f64f5728b27aa63782b731a  out