Contents
This utility compares the contents of files to check if any of them match. What is considered a match depends on the chosen method; three methods are available:
- Heuristic comparison (very fast)
- Heuristic comparison with trimming (useful for text and video files, or any files with padding bytes at the end)
- Precise comparison (slow but accurate)
For further processing of the results, you can choose between seven output modes:
- One match per line
- Original with a list of its duplicates
- Duplicate and original each on a separate line
- Only duplicates/originals
- Smallest/largest duplicates
- Oldest/newest duplicates
- Only unique files
There are many more options that let you control which files are ignored, which files should be compared, how the utility should handle symbolic links, and whether to look for files in subdirectories.
So far, I have used finddup only on macOS, therefore I can only describe how to install it on a Mac — although the instructions should work just as well on Linux.
If Homebrew is installed, you can run this command:
brew install vbwx/utils/finddup
- Download and extract the latest release of finddup.
- If desired, move the completion script(s) to the appropriate location on your system.
- Move
completion/finddup
to a directory like/etc/bash_completion.d
. - Move
completion/_finddup
to a directory like/usr/share/zsh/site-functions
.
- Move
- Make sure you have at least version 5.18 of Perl installed. (Run
perl -v
to check.) - Run the following command.
cpan .
Alternatively, if you have cpanminus installed and want more flexibility with regards to installation directories, you can run these commands:
cpanm --installdeps .
perl Makefile.PL INSTALL_BASE=...
make
make install
Run finddup --help
to get a quick overview of how to use this utility.
The following command calculates how much storage is taken up by duplicates in the entire file hierarchy of the working directory.
finddup -ra0 | xargs -0 du -ch --
Here is how to delete the newest exact copies of files located in different directories (a.k.a. keep only the originals):
finddup -pC0 some_folder another_folder | xargs -0 rm -f
Instead of running diff
in a loop, finddup can be used to determine which files have been changed, even across multiple copies of a directory.
finddup -rn folder-v*/
You can find a detailed explanation of all options, a tutorial, and more technical information in the User Manual.