Simple perl script that checksums all files in a starting directory, including the files in subdirectories. It then outputs all files with duplicate checksums.
./find_duplicate_files.pl <DIR_1> [... <DIR_N> ]
-
naturally, checksumming takes very long. so be patient
-
an optimisation method would be to queue up all checksumming jobs and do them in one or more other processes. not implemented yet. still takes a lot of time.
-
the checksum is not cryptographically strong. it is not intended to be secure, the usage scennario is to find simple file duplications, not deliberate attempts to spoof same files.
GPL3, see COPYING
Alexander Köb nerdkram@koeb.me