Go Find Duplicates

Introduction

A blazingly-fast simple-to-use tool to find duplicate files (photos, videos, music, documents etc.) on your computer, portable hard drives etc.

Note:

This tool just reads your files and creates a 'duplicates report' file
It does not delete or otherwise modify your files in any way 🙂
So, it's very safe to use 👍

How to install?

Install Go version at least 1.19
- See: Go installation instructions

Run command:

go install github.com/jpconstantineau/go-find-duplicates@latest

Add following line in your .bashrc/.zshrc file:
```
export PATH="$PATH:$HOME/go/bin"
```

How to use?

go-find-duplicates {dir-1} {dir-2} ... {dir-n}

Command line options

Running go-find-duplicates --help displays following:

go-find-duplicates is a tool to find duplicate files and directories

Usage:
  go-find-duplicates [flags] <dir-1> <dir-2> ... <dir-n>

where,
  arguments are readable directories that need to be scanned for duplicates

Flags (all optional):
  -x, --exclusions string   path to file containing newline-separated list of file/directory names to be excluded
                            (if this is not set, by default these will be ignored:
                            .DS_Store, System Volume Information, $RECYCLE.BIN etc.)
  -h, --help                display help
  -m, --minsize uint        minimum size of file in KiB to consider (default 4)
  -o, --output string       following modes are accepted:
                             text = creates a text file in current directory with basic information
                              csv = creates a csv file in current directory with detailed information
                            print = just prints the report without creating any file
                             json = creates a JSON file in the current directory with basic information
                             (default "text")
  -p, --parallelism uint8   extent of parallelism (defaults to number of cores minus 1)
  -t, --thorough            apply thorough check of uniqueness of files
                            (caution: this makes the scan very slow!)
      --version             Display version (1.6.0) and exit (useful for incorporating this in scripts)

For more details: https://github.com/jpconstantineau/go-find-duplicates

Running this through a Docker container

docker run --rm -v /Volumes/PortableHD:/mnt/PortableHD manumk/go-find-duplicates:latest go-find-duplicates -o print /mnt/PortableHD

In above command:

option --rm removes the container when it exits
option -v is mounts host directory /Volumes/PortableHD as /mnt/PortableHD inside the container

How does this identify duplicates?

By default, this tool identifies duplicates if all of the following conditions match:

file extension is same
file size is same
CRC32 hash of "crucial bytes" is same

If above default isn't enough for your requirements, you could use the command line option --thorough to switch to SHA-256 hash of entire file contents. But remember, with this, scan becomes much slower!

When tested on my portable hard drive containing >172k files (videos, audio files, images and documents), with and without --thorough option, the results were same!