TheLastGimbus/GooglePhotosTakeoutHelper

Not all photos present in yyyy-mm--dd named folders

Closed this issue ยท 17 comments

The instructions indicate

Before running this script, you need to cut out all folders that aren't dates
That is, all album folders, and everything that isn't named
2016-06-16 (or with "#", they are good)
See README.md or --help on why
(Don't worry, your photos from albums are already in some date folder)

This is however not true in my experience. I have exported photos from two different Google Accounts and each contains hundreds of photos that exist in custom named album folders which do not exist in any of the folders named yyyy-mm-dd.

Are you sure? Try to find those as hard as possible - because if it's true, then we have a problem ๐Ÿ˜•

Well, here's what I did to confirm this:

Downloaded the takeout zip file and extracted the contents to ~/takeout
Moved all subfolders that weren't named as a date to ~/takeoutalbums
Ran find -type f -name "*.jpg" -exec md5sum '{}' \; > md5sumdates.txt in the ~/takeout directory and ran find -type f -name "*.jpg" -exec md5sum '{}' \; > md5sumalbums.txt in the ~/takeoutalbums directory
With the two listings of file hashes, I opened both in Excel to do some formatting/comparison.

The date based folders contained 4151 jpg files. After deduplicating the hashes, 3374 unique hashes/jpg files remained. The album based folders contained 5268 jpg files. After deduplicating the hashes, 5147 unique hashes/jpg files remained. Already this indicates an issue as there are more unique hashes in the album folders than in the date folders.

I used the Excel MATCH function to compare the hashes found in the dates folder vs the albums folder. There were 449 hashes/jpg files that existed in the date based folders that did not exist in the album based folders. The real concern is that there are 2222 hashes/jpgs found in the album based folders that did not exist in the date based folders.

This can be solved by #10 once we come to an agreement on how we handle album/dir info. I notice a lot of names directories were folders I initially uploaded to Google vs photos that were synced directly to the service.

I would appreciate true support for albums. That said, it seems prudent to more immediately either build in support for folders named after albums now (across my 95 album folders, I only had 3 jpgs which would have required using the folder name to derive date -- maybe just skip processing those instead of aborting the whole script), or at least remove the notice

(Don't worry, your photos from albums are already in some date folder)

and replace it with a proper warning instead.

Good thing I kept the original Takeout archives :P

thing I kept the original Takeout archives :P

You could've also ran another takeout. Unless you preemptively deleted all your Google Photos.

more immediately either build in support for folders named after albums now (across my 95 album folders, I only had 3 jpgs which would have required using the folder name to derive date -- maybe just skip processing those instead of aborting the whole script), or at least remove the notice

I plan to build support for both folders and albums in some capacity in #10. In #11 I added the hashing needed to compare files to determine on an image level if they are the same photo or not. Extending this to identify which photos belong to a list of photo albums or folders should be straight forward now. The main issue is, how do we output this?

  1. Create a json file that contains a list of image names per tag. (not non-developer friendly)
  2. Keep duplicate images in the folders, as well as, in the root folder. (easy to understand but...duplicates bad)
  3. We looked into using shortcuts but this wont work on all filesystems including object store.

So it's just not clear what exactly to do yet...fastest solution that everyone can understand is option 2 so I think I would start there.

I have some thoughts on that which I can add under issue #10 / #11, but as that seems like a longer term effort, should the documentation and code be updated now to better indicate that photos in Albums may be lost with the current code?

I think that makes sense.

I've made the wording changes in readme and code. I don't currently have permissions to the repository, and to be honest, probably won't have time to work on #10 or #11 in the future. For now, I'm breaking protocol and added my changes into a fork at https://github.com/rtadams89/GooglePhotosTakeoutHelper/commit/63b6e4d56dac5988ae9bd1b50c825816e73d7212

No protocol broken you can just make a pull request from your fork.

That should resolve this issue. I'll take a look at #10 and #11 when I get a little time.

Looking at all of this, and wrapping my thoughts:

I notice a lot of names directories were folders I initially uploaded to Google vs photos that were synced directly to the service.

Okay, so looks like the case where "photos from album folders are not in date folders" touches people who uploaded whole folders/bunch of photos through desktop app/somehow else. 99% of people just download the app and let it run, so I'm calm that my code didn't break the photos for a lot of people - but this still needs to be fixed

I think we have clear path of what needs to be done:

  1. Fix #8 and all "json not found" errors by finding jsons based on it's "title" tag, instead of it's file-name. I think that should reduce the number of cases where .json file was not found to near 0 (or even literally 0! )

With this done, we could let the script run inside the "album folders", and just not copy duplicates.

If the number of "json not found" errors is near 0, we could just move those files to some special "failed" folder, to be handled manually by the user later.

Tho, 99% of people do have near-full duplicated albums, so it would generally slow it down because of the hashing thing. So:

  1. Add support for albums. How?

Create a json file / Keep duplicate images in the folders, (as well as / aditionally) in the root folder / using shortcuts

Why not let the user decide? "You can have them by shortcuts, but that may not work on all systems, or just copied to separate folder - which of those/maybe both?"

By the way - I should probably merge #18 before making any above changes, it will make stuff easier

Oh, you closed this while I was writing this ๐Ÿ˜… This issue is very much open, and adding a warning does not solve that ๐Ÿ˜•

@TheLastGimbus Re: 2. I agree, let's let the user decide is great but I don't want to implement those all in one go. I want to prioritize one method. For me, I think the simplest thing is to create duplicates in another folder. Once that works, implementing the variants should be very straight forward. So I will start by implementing that variation first and get it merged. Then myself or others can add the shortcut/json version after.

You could've also ran another takeout. Unless you preemptively deleted all your Google Photos.

I didn't delete my Google Photos, but if possible I'll avoid running another Takeout. I'm just annoyed at the fact that you can download individual Takeout archives only once. I actually had to run 3 Takeouts before I managed to download them: the first time I canceled the download because the destination I chose wouldn't have enough space, the second time the page crashed while downloading and it still wouldn't re-download the archive. Third time lucky...

You could've also ran another takeout. Unless you preemptively deleted all your Google Photos.

I didn't delete my Google Photos, but if possible I'll avoid running another Takeout. I'm just annoyed at the fact that you can download individual Takeout archives only once. I actually had to run 3 Takeouts before I managed to download them: the first time I canceled the download because the destination I chose wouldn't have enough space, the second time the page crashed while downloading and it still wouldn't re-download the archive. Third time lucky...

You shouldn't need to stay on the same page to download your takeout. Once it's started you can return to https://takeout.google.com/takeout/downloads to see progress and download finished takeouts that remain for a few days after the takeout is completed.

๐ŸŽ‰

Will push new version to PyPi very soon

pip install google-photos-takeout-helper==2.0.0rc1