Embed the version information in the respective Loc-I datasets so users can easily find out which version it is and where to get more info
Opened this issue · 4 comments
How do we describe version information in each of the Loc-I datasets (e.g. ASGS, Geofabric, GNAF)?
- each Loc-I dataset should be described with version info consistently in the metadata
2nd part - implement for each Loc-I enabled dataset.
Add details to this issue ticket. Will need to document this somewhere for consistent communication to users
I assume this will be an aspect of the dataset metadata - see CSIRO-enviro-informatics/asgs-dataset#8 CSIRO-enviro-informatics/geofabric-dataset#14 CSIRO-enviro-informatics/gnaf-dataset#2
In that context, there are a few ways to indicate version information:
- explicitly through a comprehensive provenance statement, with a date-time stamp -
prov:wasGeneratedBy/prov:endedAtTime
- date-time stamp -
dct:modified
- time-stamp will be needed if there is more than one update per day - version number -
pav:version
where pav:
is http://purl.org/pav/
My general assumption would be that
- a date-time stamp will be very specific, can be easily generated, and captured in the
dct:modified
element. This should be automatically updated when the ETL process is run - alongside this, the link to the source dataset, the details of the ETL process with any run-specific parameters, and the time the process completed must all be recorded in the provenance information (
prov:wasGeneratedBy
andprov:wasDerivedFrom
)
The link to the source data should be to a specific version.
@shaneseaton @ashleysommer @benjaminleighton Could I see an example of what the run-time parameters are, so that I can suggest how these could be recorded in a provenance record?
@benjaminleighton wrote on Slack:
On minimalist provenance for #16 I think getting this completely right first time is going to be tricky. Would sticking a pav:version in that we manually increment be sufficient for now?