Refactor how we determine if granules are cloud-hosted
Opened this issue · 0 comments
jhkennedy commented
Note: the proposed refactor in this issue is a follow-up to #526 and its closing PR #839; I have summarized pertinent details/discussion here, but I highly suggest reading the discussion in the issue.
Background: Searches can have zero results, but in that case, this bit of code would break (before #839) because it indexes into an empty list:
earthaccess/earthaccess/search.py
Lines 521 to 523 in 0e4f392
This bit of logic seems weird in that:
- we'd expect
DataGranule
to know if the granule is cloud-hosted - this assumes every
DataGranule
's cloud-hosting status inresults
is the same, but it is possible to get a mix of cloud-hosted granules and not - only when accessing data do we care if they are all cloud-hosted or not, so it probably doesn't need to be evaluated here at all
We probably want to:
- move
DataGranules._is_cloud_hosted
to a public method in theDataGranule
class - @chuckwondo suggests making it a cached property (via
functools
)@cached_property def cloud_hosted(self) -> bool: ...
- possibly remove the
cloud_hosted
parameter fromDataGranule.__init__
in favor of the cached property, though we may want to be able to override this (how we're checking if it's cloud-hosted is brittle to DAACs doing metadata things weird)
This may have implications for both DataGranules.cloud_hosted
and DataCollections.cloud_hosted
, and those methods, as well as their downstream use, should be evaluated as part of any PR addressing this issue.