Walks through a directory, adding all files to a map. Then, for every set of files with the same size, it hashes each file and lists the duplicates along with their corresponding hashes.
For a sample of 14287 files, around 3.5 G:
rmlint -r
: 0m9.553sfdupes -r
: 0m14.612sdedup
: 0m11.646s
Before each run, disk caches were cleared:
$ free && sync && echo 3 >| /proc/sys/vm/drop_caches && free
After cloning the repository, run:
cd dedup
make
./dedup <dir1> <dir2> ...
Note that the program depends on OpenSSL for the hashing.