ssec/sift

Allow user to load files with names that deviate from convention

Opened this issue · 11 comments

When the file wizard is used with SIFT 1.1.3, files cannot be loaded if the name does not match convention based on the specified reader (at least for GOES-R ABI Level 1b). There should be an option for the user to attempt to load the file with the specified reader regardless of whether the file name matches (with appropriate notice of failure if applicable).

Are values from the file name subsequently used? (Things like day/time/band) Or are these extracted from meta data within the file?

As Scott points out, Satpy needs certain bits of information from the filename so it is not possible to load files that don't match some known/standard filenaming convention that Satpy knows about. For example, Satpy needs to know that the netcdf file it is looking at can provide a specific channel of data. If you gave it 16 files of an unknown format it would not be able to know which file has C16 until it opened the file. In the low-level code this is very hard to implement in a flexible way that works for many use cases (some "file types" need to be loaded in different ways) so Satpy chooses to require your filenames to match what it knows about.

Makes sense... Perhaps the solution is to look for a subset of the file name then? When I download a file from Google cloud, I have to remove a prefix at the beginning of the file name for SIFT to recognize it without the 'X' icon.

download a file from Google cloud

This is what I would consider a bug in google's web interface (maybe AWS too). There are actually two buttons you can click on to download a file from the web interface. One gives you the "standard" filename and one gives you the odd one that you have. The actual files as stored on their cloud storage does not have this weird naming. Maybe we can contact them about fixing this (I doubt we'd be the first).

Talked to some pytroll folks about this because it sounded familiar. This issue used to exist on AWS but @simonrp84 contacted them and got them to fix it. Google has yet to implement this fix.

Yes, I contacted both Amazon and Google about this last year, Amazon quickly implemented a fix but Google never got back to me - so unfortunately files downloaded from their site still have incorrect file names.
I suggest getting in contact with google, maybe if enough people raise the issue then they will fix it! Could you download from AWS in the meantime?

Maybe alternatively we could just support the alternative Google file name? New users might be confused why they cannot load a file they downloaded from that source... I know it took me a few minutes.

I feel fairly strongly that these filenames should not be supported. At least on a logical/moral/best practice level given that this is a bug (two of the three download methods on the pages give you the correct filename) in the GCP side. If this is something Google won't fix then I'll be more open to changing it, but would like some time to contact people for a fix.

@simonrp84 any idea who to contact about the Google bucket? I couldn't find any contact information on the different pages linked to the bucket.

@djhoese GCS have paid-only support, and I'm not paying to raise this issue ;-)
That said, NOAA have a big data support contact - I have just dropped them an email about this so let's see what they say.

I discussed with @djhoese today and he said he would investigate adding support for an optional suffix for custom files.

Just a small update: As mentioned above this is now merged and released in Satpy (for a quite a few months now), but handling it in any special way is not currently supported in SIFT. The "best" way to do this would probably require the overhaul of the layer list that @rayg-ssec and I had talked about (separate from the timeline view). If the layer list behaved better I'd be more comfortable adding more information to it (either as a tool tip or something similar) or maybe I should dump this information to the layer details pane. 🤔

Regardless, I'll keep this open since the overall goal is still not complete.