List input files in history metadata attribute
Closed this issue · 6 comments
At present, Zarrs produced by nc2zarr don't contain any indication of the source files from which they were generated. nc2zarr should optionally include a list of source files in the value of the Zarr's history
attribute on first generation, and update this value with the additional input files when appending to an existing Zarr.
Several metadata attributes may be updated in a CF-compliant way, see section Description of File Contents in the CF conventions.
Implementing this will also help with implementation of a related feature request from CloudFerro: the ability to ignore an input file when appending if it has already been ingested into the target Zarr. nc2zarr could check the input pathname or filename against the list in the history attribute before appending.
Implementing this will also help with implementation of a related feature request from CloudFerro: the ability to ignore an input file when appending if it has already been ingested into the target Zarr. nc2zarr could check the input pathname or filename against the list in the history attribute before appending.
We should not use metadata to make decisions about the data in the dataset. Whether a timeslice has already been processed or not should be detected by looking into the data: time coordinates. Once it is detected there are two options: ignore new data or replace existing. To replace an existing timeslice by a more up-to-date one is a valid use case we have in other scenarios. (Example: Same Sentinel 3 Level-2 data is beeing processed in a fast lane and another one that takes much more time but has higher data quality. When the second data arrives, the first is replaced.)