/missing

WORK IN PROGRESS. Given two file structures this program will check for files that exist in one structure but are *missing* from the second.

Primary LanguageRubyOtherNOASSERTION

Build Status Code Climate Test Coverage

Missing

Given two file structures this program will check for files that exist in one structure but are missing from the second. Created to check for photos that have not be moved into a master archive before deletion.

Itch

Lost photos! Our household has two iPhones, one Android tablet and one digital camera all of which are used to take photos. Photos are shared to Dropbox using the DropBox Camera Upload functionality. Under the Dropbox top level a Camera Upload directory is created which is shared to each other (we have separate DropBox accounts) using a DropBox share. Whoever gets to the photos first, sorts, categorises and moves the photos to our main photo library (also shared via DropBox).

If the DropBox photo upload is cancelled part way through or something else goes wrong, Dropbox thinks it has uploaded photos when it hasn't. This means it won't touch the files on the device and the photos remain there. As Dropbox manipulates the file name on import to match the date and time, it is not possible to manually check which files have been uploaded or not.

Scratch

This program checks if the photos (or any files) that are manually pulled from the device, exist in the main photo library.

The <master library path> and <possible duplicates path> are passed in as command line options. New / non-duplicate files are copied to a directory named new under the possible duplicates directory. If the <rename flag> is set to yes the files are renamed to DropBox photo import file name format (%Y-%m-%d %H.%M.%S.jpg). For photos with EXIF date and time properties, this will be used to derive the filename as per DropBox functionality, else the file timestamp is used.

Existing / duplicate files are copied to a directory named duplicates under the <possible duplicates path> and are renamed to include the name of the master file, for example DSC00001.jpg duplicate of 2014-09-03 21.49.37.jpg.

How it works?

Files are compared using MD5 checksums. New files are optionally named using EXIF time and date properties to the DropBox photo file name format:

%Y-%m-%d %H.%M.%S.jpg

Of course there are many other use cases for type of program.

If you are looking for a Ruby program to check for duplicate files and photos this code appears to work quite well.

Usage

Organise both master library and possible new/duplicates under one top level directory structure respectively.

ruby missing.rb  -l <master library path> -d <duplicates path> -r <rename flag> 

The following options are available:

-l   master / library path 
-s   (possible) duplicates path 
-r   flag to indicate whether to rename file to DropBox format (Y or N) 

Limitations

Obviously if you have played around (resized, touched up, etc.) the photos before storing them in the photo archive, this method will not work. You have to get a bit cleverer in order to solve this problem (see here and here).

Unit Tests

MiniTest unit tests can be found in the test directory.

Tests can be run using:

rake test

or with Guard:

bundle exec guard

Test Scenarios

The test cases utilise a set of sample photos found in:

  • test/files/master_library.
File Name EXIF Date File Date
2014-09-05 15.52.34.jpg
2014-09-05 19.37.17.jpg
  • test/files/possible_dups

When run with <rename flag> equal to Y:

File Name EXIF Date File Date Duplicate of Resultant Filename
DSC00001.jpg 2014-09-05 19.37.17.jpg duplicates/DSC00001.jpg duplicate of 2014-09-03 21.49.37.jpg
DSC00002.jpg "2014-09-05 20:22:59 +1000" Not duplicate new/2014-09-05 20.22.59.jpg
DSC00003.jpg nil Not duplicate new/file date
DSC00004.jpg 2014-09-05 15.52.34.jpg duplicates/DSC00004.jpg duplicate of 2014-09-05 15:52.34.jpg

To Do

  • Finalise coding
  • Tests

License

MIT. Use and abuse. No comeback.