/finddup

Finds duplicated files fast and efficiently

Primary LanguagePerlMIT LicenseMIT

Perl MIT license

finddup

Contents
  1. Description
  2. Installation
    1. Installation via Homebrew
    2. Manual Installation
  3. Usage
    1. Examples
    2. Manual

Description

This utility compares the contents of files to check if any of them match. What is considered a match depends on the chosen method; three methods are available:

  • Heuristic comparison (very fast)
  • Heuristic comparison with trimming (useful for text and video files, or any files with padding bytes at the end)
  • Precise comparison (slow but accurate)

For further processing of the results, you can choose between seven output modes:

  • One match per line
  • Original with a list of its duplicates
  • Duplicate and original each on a separate line
  • Only duplicates/originals
  • Smallest/largest duplicates
  • Oldest/newest duplicates
  • Only unique files

There are many more options that let you control which files are ignored, which files should be compared, how the utility should handle symbolic links, and whether to look for files in subdirectories.

Installation

So far, I have used finddup only on macOS, therefore I can only describe how to install it on a Mac — although the instructions should work just as well on Linux.

Installation via Homebrew

If Homebrew is installed, you can run this command:

brew install vbwx/utils/finddup

Manual Installation

  1. Download and extract the latest release of finddup.
  2. If desired, move the completion script(s) to the appropriate location on your system.
    • Move completion/finddup to a directory like /etc/bash_completion.d.
    • Move completion/_finddup to a directory like /usr/share/zsh/site-functions.
  3. Make sure you have at least version 5.18 of Perl installed. (Run perl -v to check.)
  4. Run the following command.
cpan .

Alternatively, if you have cpanminus installed and want more flexibility with regards to installation directories, you can run these commands:

cpanm --installdeps .
perl Makefile.PL INSTALL_BASE=...
make
make install

Usage

Run finddup --help to get a quick overview of how to use this utility.

Examples

The following command calculates how much storage is taken up by duplicates in the entire file hierarchy of the working directory.

finddup -ra0 | xargs -0 du -ch --

Here is how to delete the newest exact copies of files located in different directories (a.k.a. keep only the originals):

finddup -pC0 some_folder another_folder | xargs -0 rm -f

Instead of running diff in a loop, finddup can be used to determine which files have been changed, even across multiple copies of a directory.

finddup -rn folder-v*/

Manual

You can find a detailed explanation of all options, a tutorial, and more technical information in the User Manual.