drewnoakes/metadata-extractor-dotnet

Support detection of Placeholder files

HEIC-to-JPEG-Dev opened this issue · 5 comments

When getting metadata from photos, everything is fine until it hits a Placeholder file (OneDrive/iCloud, etc.).

For those files the metadata cannot be extracted without downloading the file from the cloud which is a very long process, requires the internet, and defeats the disk saving feature of those files.

The information is available from the file, just not by opening it; for example, the Windows property system will have a subset of properties (depending on the sync providor); this can also be read using the system indexer.

Is there any potential for this being implemented? I understand that it would be a Windows only feature.

I'm not familiar with these placeholder files. Are they still real files that can be opened and parsed? If so, we'd be open to discussing further and might ultimately accept a PR.

They are still files, but sparse files. That is, they are < 1K in size (regardless of how big the real file is). The real file is kept in a sync providors cloud service (iCloud, OneDrive, Google, etc.).

If you try to open the file; windows Kernal instructs the sync providor to doanload the file and make it fully available.

So, when I use metdadataExtractor against one of these file, technically it works as your code doesn't see what goes on behind the scenes. But it leads to every file being downloaded, which is against the whole point of these types of files.

What Placeholder file aware code should be doing is identifying that the file is a sparse file, then asking the Windows property system what properties are available (think EXIF data) and then asking for those properties. This is very fast.

Sync providors typically fill the 1K sparse file with a thumbnail, common properties for image, video, music, etc., and other information.

If you try to open the file; windows Kernal instructs the sync providor to doanload the file and make it fully available.

How would MetadataExtractor read the file if the kernel's going to transparently intercept the file system request and do the download?

I would be concerned that any fix here would be platform-specific.

Apple has the same concept (store full versions in iCloud) and on Windows, Apple do the same as well as Microsoft.
The information you're "allowed" to get is part of the placeholder file format - go beyond that, and it will download the file.

I'm sure it's implemented differently on Windows and Mac, but anything that gets the metadata from multiple files will hot issues.

It would help if you could find some analysis or documentation about the file formats.