Some python to deduplicate some home pics
Dedup process
- hash uniq images + videos
- duplicate heic and jpg -- go through ~/Desktop/hash-all.txt list, match on IMG_XXXX.ext -- if ext includes both heic and jpg, drop jpg
python3 ./hash.py ~/Downloads/NZ-AU\ Adam\ Photos\ All\ Original ~/Downloads/December\ 2019\ NZ_AUS\ Trip\ (Cris\ Phone\ Pics) ~/Downloads/Dec\ 2019\ NZ_AU\ Trip\ Album ~/Downloads/Auckland\ NZ\ Pics > ~/Desktop/hash-all.txt
python3 ./heic_dedup_hash.py ~/Desktop/hash-all.txt > ~/Desktop/final-copy.txt
python3 ./copy_files.py ~/Desktop/final-copy.txt
grep EFFECTS ~/Desktop/hash-all.txt | xargs -I{} cp -f {} ~/Downloads/union
- NZ AU leftovers-001.zip
- Originally uploaded content, that still has live-motion photos, and misc videos. This was used to prevent uploading locally dedupled data, that has already been uploaded.
- final-dedup-missing-media-with-live-motion.zip
- Final dataset. This dataset was not uploaded as is, rather live-motion videos were removed from live pics first, then it was uploaded. So only live motion pics (without paired videos) and actual non-motion video recordings were uploaded. This zip contains all deduplicated locally and previously uploaded "NZ AU leftovers" media.
- final_datetime_name_excluded_merge_stills.zip
- This was what was actually uploaded on 8/8/22, the ~734 files that were missing from Google Photos "leftovers" media.
- Auckland NZ Pics.zip
- Random media
- Dec 2019 NZ_AU Trip Album.zip
- Random media
- December 2019 NZ_AUS Trip (Cris Phone Pics).zip
- Random media
- NZ-AU Adam Photos All Original-001.zip
- Random media
- I was able to correclty map each live motion with live photo, and dedup the non live photos via md5 finger prints.
- Google Photos does not have an open API for uploading live photos :( So best i can do is save stills for now (without video), in the chance Google Photos adds live photo upload in the future.
- Overall, this adds 734 missing still photos + actual non-live motion videos.
- ./file_exiftool.py collects the required exif tags into a json file
- ./metadata_uuid_match.py looks at that exif tags json file and quickly dedupes on the exif tags or fingerprints.
- this seemingly is the right solution as the resulting media is indeed missing from Google Photos.
- upon uploading, the media should sync to the correct date/time, and I can re-create the album to share.
- final-dedup-missing-media-with-live-motion.zip has the media generated from this codebase, of which has been deduped from both local and originally uploaded "leftovers" content, but live motion videos was preserved in this zip file. Live motion videos were stripped from today (8/8/22) upload since Google does not allow uploading merged live motions, only the iOS Google photos app allows that. Note, live motion videos were not uploaded. To get live motion photos and non-motion video media, just use ./file_exiftool.py and ./metadata_uuid_match.py tool again on the final_datetime_name_excluded_merge_stills set.