nsidc/earthaccess

Refactor how we determine if granules are cloud-hosted

Opened this issue · 0 comments

Note: the proposed refactor in this issue is a follow-up to #526 and its closing PR #839; I have summarized pertinent details/discussion here, but I highly suggest reading the discussion in the issue.


Background: Searches can have zero results, but in that case, this bit of code would break (before #839) because it indexes into an empty list:

response = get_results(self.session, self, limit)
cloud = self._is_cloud_hosted(response[0])

This bit of logic seems weird in that:

  • we'd expect DataGranule to know if the granule is cloud-hosted
  • this assumes every DataGranule's cloud-hosting status in results is the same, but it is possible to get a mix of cloud-hosted granules and not
  • only when accessing data do we care if they are all cloud-hosted or not, so it probably doesn't need to be evaluated here at all

We probably want to:

  • move DataGranules._is_cloud_hosted to a public method in the DataGranule class
  • @chuckwondo suggests making it a cached property (via functools)
    @cached_property
    def cloud_hosted(self) -> bool:
        ...
  • possibly remove the cloud_hosted parameter from DataGranule.__init__ in favor of the cached property, though we may want to be able to override this (how we're checking if it's cloud-hosted is brittle to DAACs doing metadata things weird)

This may have implications for both DataGranules.cloud_hosted and DataCollections.cloud_hosted, and those methods, as well as their downstream use, should be evaluated as part of any PR addressing this issue.