/dedugo

Simple duplicate image finder

Primary LanguageGoApache License 2.0Apache-2.0

Dedugo

Summary

This simple program evaluates two directories of images and finds images that are similar. It aims to be really fast and simple to use.

Both directories are searched recursively for any compatible image formats (.jpg, .png, .heic).

Usage

There are three phases to using this tool: finding duplicate images, confirming the detected duplicates, and deleting the confirmed duplicates.

Finding Duplicates

The first argument is the directory for the images to match against. You can consider this directory the "originals" which you don't want to be deleted. The second argument is the directory containing images which may be duplicated and which you may want to delete files from.

dedugo find-duplicates ./reference/image/directory ./evaluation/image/directory

Things will happen. Silicon will get hot. Fans will spin.

Checking Results

The check-results subcommand allows the user to visually confirm if detected duplicates are actually duplicate images. Because no algorithm is perfect, false positives are likely to happen. This will allow the user to confirm if a pair of images is a duplicate or not.

dedugo check-results

Deleting Duplicates

Once duplicate images are confirmed, they can be deleted in one fell swoop by running:

dedugo delete-duplicates

To Do

  • Allow user to visually confirm if paired images are indeed duplicates or are actually just very similar
  • Convert this to use Cobra
  • A GUI would be neat
  • Add delete command to remove confirmed duplicates
  • Make image loading faster
  • Allow user to specify output filename for find-duplicates and input filename for check-results
  • I probably need to incorporate the idea of similar image clusters rather than just image pairs.
  • Write tests...
  • GUI should show if an image has already been confirmed and allow user to unmark it.
  • Prevent system sleep while running find-duplicates.

Thanks

A special thanks to Vitali Fedulov for writing the Go package upon which this tool is built.

And also, thanks to jdeng for writing the heif decoder and adrium for maintaining a fork of it that runs on my Linux install.