This tool is used for purging (deleting) duplicate files from your directory tree(s).
- I take photos and videos mostly on my OnePlus 6T phone. They land in
/DCIM/Camera
. I create folders within/DCIM
for specific activities (such as/DCIM/Dunes
) in an effort to keep the Camera folder sane. - I periodically use the wonderful FolderSync Pro on my phone to push the entire
/DCIM
tree up to the file server on my LAN. I set up FolderSync to push the entire/DCIM
tree "To remote folder" so that no modifications are ever done on the phone itself, and the DCIM tree on the LAN only accumulates files (never deleted by FolderSync). Note that files will accumulateDCIM/Camera
folder on the LAN. - On the phone I use the wonderful Simple Gallery Pro - Photo Manager & Editor as my gallery app. Periodically, when I feel like organizing photos/videos on my phone I'll look through the Camera folder and move photos/videos to existing or new folders within
/DCIM
, such as/DCIM/Dunes
. - Now we have a problem... Since FolderSync runs push the full
/DCIM
tree to the LAN file server I'll tend to accumulate copies of photos/videos in both the Camera and the Dunes folders. Without some scheme to control the mess, over time I'll end up with tons of duplicate files. - photodedup is used to find all duplicate files within the LAN
DCIM/
tree and optionally delete duplicates that exist in theDCIM/Camera
folder.
- The primary usage is to delete files in the LAN-side
DCIM/Camera
folder that also exist in other folders withinDCIM/
, as described above. - Nothing is deleted unless the
--purge
switch is specified. Without--purge
duplicates are only listed. - You may have other photo trees from prior phones and backups on your hard disk.
--second-tree
mode supports deleting duplicate files from these older/other backups that also exist in the main tree (main_root
). In theTwo tree mode, and --no-time and --no-size switches
example below the fileChina Wall Video.MOV
would be deleted from the secondary tree folder/mnt/share/media/Pictures/Dunes/Dan
, and retained in the main tree folder/mnt/share/media/Chris phone pix/Dunes
. - photodedup works on all file types, not just media files. When using the
--second-tree
mode duplicates found in both the main tree and the second tree may be deleted from the second tree regardless of file type/extension. Note that single tree mode specifically looks for duplicates relative to theDCIM/Camera
folder, so this mode is pretty photo/video specific. Other tools for finding duplicates may be better suited to your needs, including CCleaner and Beyond Compare. - Some evil file management tools clobber the file modification datetime on files. (Both FolderSync and Simple Gallery are well behaved.) This gives rise to duplicate photos that in the trees that have different modification times. The
--no-time
switch may be specified to disable matching on datetime. SeeTwo tree mode, and --no-time and --no-size switches
example below for fileIMG_20140426_115922_309.jpg
. - Eventually you'll have files from different cameras that have the same filename but different datetime and size. photodedup will recognize these as different files and not attempt any purges. Alternately, you may edit a file and save it to the same filename, producing a duplicate with a different size. To identify the original and edited versions as duplicates specify the
--no-size
switch along with the--no-time
switch. SeeTwo tree mode, and --no-time and --no-size switches
example below for fileDSCF2363.JPG
. Three different versions of this file exist, some with differing datetime stamps. Manually deleting these duplicates may be most appropriate. - Various tools may create
.Thumbnails
subdirectories. These may be deleted using thethumbs
switch. - photodedup was developed and tested on Linux and Python 3. It is not supported on Python 2.7, but seems to run fine. Issues may be encountered with file/path names with Unicode characters / code points. If you have issues with Python 2.7, please use Python 3.
- 200602 v0.2 Fixed typo in argparse for --purge switch.
- Simply place the script file photodedup in a folder in your path environment variable, or run it while specifying the full path to the file.
$ ./photodedup -h
usage: photodedup [-h] [--no-size] [--no-time] [--second-tree SECOND_TREE]
[--thumbs] [--purge] [-V]
main_root
Find and delete duplicate files within a specified tree.
positional arguments:
main_root Root of the main tree. In single tree mode, duplicates in <main_root>/Camera will be deleted.
optional arguments:
-h, --help show this help message and exit
--no-size Disregard file size when identifying duplicates. Size is included by default.
--no-time Disregard file modification time when identifying duplicates. Time is included by default.
--second-tree SECOND_TREE
Root of other archive tree. Duplicate files will be deleted from this tree.
--thumbs Find .Thumbnails directories. Delete with --purge.
--purge Do identified deletes. Without --purge, print duplicates only.
-V, --version Return version number and exit.
$ ./photodedup /mnt/share/media/Chris\ phone\ pix/
Checking for files with MATCHING timestamp and MATCHING size.
File <20170321_221733-1.jpg> has 2 copies
Tue Mar 21 23:18:44 2017 2032139 bytes /mnt/share/media/Chris phone pix/Pets
Tue Mar 21 23:18:44 2017 2032139 bytes /mnt/share/media/Chris phone pix/Camera
File <20181022_114054.jpg> has 2 copies
Mon Oct 22 11:40:54 2018 7732960 bytes /mnt/share/media/Chris phone pix/Pets
Mon Oct 22 11:40:54 2018 7732960 bytes /mnt/share/media/Chris phone pix/Camera
File <20181102_160747.mp4> has 2 copies
Fri Nov 2 16:08:24 2018 69911939 bytes /mnt/share/media/Chris phone pix/Camera
Fri Nov 2 16:08:24 2018 69911939 bytes /mnt/share/media/Chris phone pix/Dunes
...
Found 25 duplicate files. Deleted 0 duplicate files.
$ ./photodedup /mnt/share/media/Chris\ phone\ pix/ --second-tree /mnt/share/media/Pictures --no-time --no-size
Checking for files with ANY timestamp and ANY size.
Operating on --second-tree only
Found instance(s) of File <China Wall Video.MOV> in second tree that exist in the main tree:
Main tree: Sat Feb 7 12:09:20 2015 100288778 bytes /mnt/share/media/Chris phone pix/Dunes
Second tree: Sat Feb 7 12:09:20 2015 100288778 bytes /mnt/share/media/Pictures/Dunes/Dan
...
Found instance(s) of File <IMG_20140426_115922_309.jpg> in second tree that exist in the main tree:
Main tree: Sat Apr 26 11:59:22 2014 1526723 bytes /mnt/share/media/Chris phone pix/HPVC'14
Second tree: Mon Apr 28 10:26:31 2014 1526723 bytes /mnt/share/media/Pictures/NAU HPVC 2014
...
Found instance(s) of File <P9110100.JPG> in second tree that exist in the main tree:
Main tree: Tue Sep 11 11:04:06 2018 2988468 bytes /mnt/share/media/Chris phone pix/CO Springs
Second tree: Tue Sep 11 11:04:06 2018 2988468 bytes /mnt/share/media/Pictures/CO Springs Sep'18
...
Found instance(s) of File <DSCF2363.JPG> in second tree that exist in the main tree:
Main tree: Sat Dec 31 11:48:24 2016 4146999 bytes /mnt/share/media/Chris phone pix/Bottlebrush
Second tree: Sun Oct 4 17:53:38 2009 809034 bytes /mnt/share/media/Pictures/not
Second tree: Sun Aug 5 20:28:50 2007 803603 bytes /mnt/share/media/Pictures/Mabul07
Second tree: Sun Aug 5 20:28:50 2007 803603 bytes /mnt/share/media/Pictures/Mabul07/Select
Second tree: Wed Apr 10 13:50:56 2013 4146999 bytes /mnt/share/media/Pictures/Bottlebrush
Second tree: Mon Dec 21 14:45:56 2009 3119118 bytes /mnt/share/media/Pictures/Family gatherings/Xmas 09
Found 964 duplicate files. Deleted 0 duplicate files.
- none
- 200602 v0.2 Fixed --purge help typo
- 200306 v0.1 New