/gp-takeout-json-fix

A script to fix the naming issues of the media companion json files generated by Google Photos Takeout.

Primary LanguagePythonMIT LicenseMIT

Google Photos Takeout Pre-Exif Organiser

As we all know Google Photos is utterly pathetic when it comes to Taking out media files.

After you've untarred/unzipped your takeout archives, the following issues are present:

  • All media files have incorrect or absent Exif data (this data is in related .json files).
  • Some media files have no companion json files.
  • Media files with long names have cut-off companion json file names, e.g.:
Photo:      IMG_123456790.jpg
Companion:  IMG_123456790.j.json
  • Some jpg files and companion files have the jpeg extension, e.g.:
Photo:      IMG_1234.jpg
Companion:  IMG_1234.jpeg.json
  • Some media files and companion files have inconsistent extension casing, e.g.:
Photo:      IMG_1234.jpg
Companion:  IMG_1234.JPG.json

How to fix this

python3 fixgptakeout.py [dir]

Where [dir] is the directory to recursively fix. Right after unarchiving your takeout you can use "Takeout/Google Photos" for example.

Why this matters

ExifTool

The awesome ExifTool project can be used to automatically import the json data as exif data into the relevant media files, but only if the json and media files are named perfectly consistently.

For sake of interest, here is the command that does the exif fix:

exiftool -r -d %s -tagsfromfile "%d/%F.json" "-GPSAltitude<GeoDataAltitude" "-GPSLatitude<GeoDataLatitude" "-GPSLatitudeRef<GeoDataLatitude" "-GPSLongitude<GeoDataLongitude" "-GPSLongitudeRef<GeoDataLongitude" "-Keywords<Tags" "-Subject<Tags" "-Caption-Abstract<Description" "-ImageDescription<Description" "-DateTimeOriginal<PhotoTakenTimeTimestamp" -ext '*' -overwrite_original --ext json [dir]

The %d/%F.json part specifies that the companion json files will be named exactly the same as the related media files (with a lowercase extension) and .json appended to the end.

Chevereto

Chevereto is an open-source photo hosting app that has native support for importing Google Photos Takeout images and parsing the related json files, but obviously only if they are named consistently.

Deduplication

The takeout does a lot of unnecessary media duplication. Specifically, media items that exist in more than one album are fully copied to all relevant album folders. And this includes the generated by-year albums. So, if you have a 512Mb video in a road-trip album from 2016 then that video will also exist (as a full copy) in the album-folder Photos from 2016 thereby taking up a GB of space. It's even worse if you have media items in 3 or more albums each.

The fix

python3 dedup.py [dir]

Where [dir] is the parent directory that contains all the album-folders.

The following will happen:

  • Media items in any user albums will be deleted from the relevant by-year album-folder.
Album-folder Media files
Road trip '16 img123.jpg
img124.jpg
Photos from 2016 img123.jpg
img124.jpg
  • Media items that occur in multiple user albums will be moved to "multi-album" folders and deleted from their source album-folders so that there is only one copy of said media items.
Album-folder Media file
Road trip '16 img123.jpg
Road trips mega-album img123.jpg
[New] Road trip '16 _, Road trips mega-album img123.jpg
  • If a media item has a companion .json file, that file will be moved/deleted along with it. Thus, it's crucial to first run the above json fix script to get the naming right.
Album-folder Files
Road trip '16 img123.jpg
img123.jpg.json
Photos from 2016 img123.jpg
img123.jpg.json

Notes:

  • Duplication checking happens by way of md5 hashing, because different photos may coincidentally have the same name.
  • It's probably best to run the deduplication before you make exif changes to the images. Because you never know with Google the same image could have a json file in one album-folder but not in another. The exif difference would cause the two instances to hash differently, even though they are the same image.

Why

  • Reduces space usage
  • Makes importing into a photo hosting app easier. Now you can simply select a folder and add all its contents to the album/s it's named after. Or even write a script to do it with some cheeky API calls.

Posterity

Because Google often changes its API's on a whim, I fully expect these scripts, and the related exiftool command not to work at some point in the future. But, as of January 2021 it works, so Takeout your photos and use it while you can!