harmonised versus as-is datasets
Closed this issue · 18 comments
If we have only ONE md-set for the harmonised and the as-is "view" on ONE dataset, we need to distinguish these to "views" in the ressource locator - links. How to do that?
In the AppProfile, ProtocollValue, Description, ...?
Erik
Sounds like a requirement for indication of profiles, also being discussed in OGC during the API work. The following link could be relevant, providing information on the profile element from JSON Hypertext Application Language: https://tools.ietf.org/html/draft-kelly-json-hal-07#section-5.6
As I presented during the MIG-T, my point of view on this topic is that INSPIRE should appear as a "transparent layer" upon the data provided by the MS National Geoportal, by integrating the information in their current catalog, and embrace all the dataset metadata already available instead of duplicating it.
The implication of having two metadata is: how do you keep it updated and aligned?
Based on my current experience with the Geoportal, I see that the metadata document is updated quite frequently (eg. initially by fixing errors and typos, then integrating keywords, then updates from the validation workflow, to start again maybe with some new keyword requirements, or the conformity section).
Therefore, I would rather keep away from the scenario presented with the above diagram, and try to find a solution within the ResourceLocator elements.
So, what about using a (surely new) agreed codelist, maybe in the description
element, to address this problem?
Something like
<gmd:description>
<gmx:Anchor xlink:href="http://inspire.ec.europa.eu/metadata-codelist/SpatialDataSetProfile/harmonized-dataset">INSPIRE harmonized dataset</gmx:Anchor>
</gmd:description>
@dartasensi This is not the case. We have two different datasets, the "as-is" and the harmonised. In most cases the harmonised dataset contains lesser information, has an other data model, and a different CRS, update cycle etc.
So it's not a duplicate of metadata, which should be aligned. It are different metadata records, with different content, describing different datasets. And in our opinion each dataset should have their own metadata. So we are not able (at this moment) to see the view and downloadservice of harmonised data as one of the online resources of the "as-is " data.
@idevisser thank you for sharing your experience and these insightful remarks.
I see your point, so I would review our opinion internally to try to include your suggestions and we will come back with a new proposal.
comment edited in order to clarify some minor point
@oberseri going back to your original question, and then your diagram above... What if we choose this approach, expecially on the red box? (credits to @fabiovin for the idea)
Following the 66th MIG-T (expecially about the discussion on the co-existence of "as-is" and harmonised data), we think that this approach would be effective in solving this.
In this scenario, the Geoportal will collect INSPIRE dataset (both harmonised or not yet harmonised). Even if in the diagram seems to express only this, it is clear that every dataset that would fulfill the INSPIRE requirements should be included in a virtual CSW or through an OGC filter, and made available by the National Geoportal curators.
All the involved metadata will have a dedicated attribute/property within a ResourceLocator, so the harvested metadata could keep a reference to the more expressive but "not-harmonised/not INSPIRE" dataset metadata (ie. avoiding to point directly to the data, without the context of its metadata). This information would also appear in the EU INSPIRE Geoportal.
Moreover, I think this approach (expecially the relation in the red box) could also cover the current scenario indicated above by @idevisser, quoting:
It are different metadata records, with different content, describing different datasets. And in our opinion each dataset should have their own metadata.
With the dashed arrow, I would suggest that, since that metadata will be "not obliged" to INSPIRE requirements, it can at least follow the idea and offer a ResourceLocator to the harmonised description. It will be up to the National Catalogue to integrate this.
If we agree on this approach, the next logical step will be: which element (within the ResourceLocator) can we pick in order to express this "relation/attribute"?
Note: I recreated your diagram in Powerpoint and uploaded here. Feel free to use it, edit and commit your changes under the /proposal
folder of your country.
Looks like this diagram is not describing the situation of one MD and one dataset as in the the original question is the case...
I think this proposal needs some more explanation....
Is only the harmonised metadata description available in the front end of the geoportal and the as is only findable through the dataset metadata of the harmonised data? Is this in line whit the Commissions point of view expressed in the last MIG that even if there is harmonised data, the as is data must always remain available?
Or if both metadata records are harvested and available in the front end of the geoportal, why this reference to the dataset metadata in the resource locator is proposed?
A reference to metadata of another dataset in the resource locator element is not expected on that point and not quit clear outside the inspire community
@MarieLambois From standardisation point of view, In which metadata element should the relation to the source dataset be made?
From a standardization point of view, I would say that the resource locator is fine for the case one MD for both dataset. Purely ISO speaking I would say that the intention would be to distinguish both links based on unitsOfDistribution.
For the case with 2 MDs, I think that a link to the metadata in the Resource Locator is not correct semantically speaking (because it is supposed to be a link to the resource and in this case you consider that it is a different resource) . Again purely ISO speaking I would say that it should go under the lineage section (because the as-is data is the source of your harmonised data). For me a LI_Source/description would fit. An implementation of the lineage statement as an Anchor would be another solution.
From an INSPIRE point of view,I am still not convinced that we really need to implemnt this link in a standardized way. As far as only the INSPIRE MD goes to the European portal (>indicators are OK) and that a human user can read/understand where/what the original data is (can be easily written in the lineage statement), I think all use case are fulfilled. Again no need to overstandardize.
@idevisser: The reference to the harmonised datset could be usable to clearly state, that there also exists a harmonised version of this data"theme" - otherwise it would decrease your monitoring results.
@idevisser 2: I think references to "websites with further instructions" or additional information should be no problem.
I wonder about the model of publishing AS-IS data. Im not sure of the full use-case here but I add some comments on my thoughts. Would it not be of interest for users to find all data related to a specific theme so that all resources of relevance is harvested to inspire Geoportal ? So that we publish for a specific theme:
- Harmonized Inspire data
- Non-Harmonized Inspire data
- As-Is data
It should also be quite easy to set a filter in Inspire-Geoportal UI for each of these cases.
If we dont publish As-Is data to Inspire Geoportal
To separate these when calculation monitoring results
- Harmonized Inspire data -> Does have conformance report referencing Implementing rules 1089/2009 with PASS=TRUE
- Non-Harmonized Inspire data -> Does have conformance report referencing Implementing rules 1089/2009 with PASS!=TRUE
- As is data -> Shall NOT have conformance report referencing Implementing rules 1089/2009 but should have a reference to an other (National?) specification
Only dataset that references 1089/2009 should be part of monitoring-calculations. Then no separate keyword is needed.
I also agree on Ine's comment above that we shall separate metadata-records for INSPIRE Harmonized datasets and AS-IS datasets since they are in most cases completely separate.
I agree in Marie's comment above that Im not sure we need the resource locator to relate the Harmonized versus AS-IS data.
I think also for a single Harmonized Inspire dataset it could exist multiple AS-IS datasets, which could make this tedious to mange.
If we anyhow needs the relations I would prefer to use MD_AggregationInfo to manage relations between datasets even though that this would extend the strict definition of this class. But on the other hand this would introduce a completely new class to Inspire MD which is not preferred.
At least there are two open questions:
For countries providing two (or more) MD for harmonised AND as-is data (not harmonised):
How to prevent for bad monitoring results for datasets with 1089/2009-PASS=FALSE althoug there is an additional harmonised dataset?
For countries providing only one MD for harmonised AND as-is data:
How to indicate which URL links to the harmonised an which to the as-is dataset/distribution? Do we need this indication?
For countries providing two (or more) MD for harmonised AND as-is data (not harmonised):
How to prevent for bad monitoring results for datasets with 1089/2009-PASS=FALSE althoug there is an additional harmonised dataset?
Im not sure here, but my thinking here would be as below:
If the two datasets having overlapping contents. Which means that all data that should be published for that theme exists in the dataset that is approved (eg both contains Road network). Then I would set the second as AS-IS with no conformance report. When it conforms to harmonization then a conformamnce report can be added. This is since this dataset is not required by Inspire. It is just an additional dataset.
But If the two datasets has separate content so that both datasets are needed to cover the whole content of the specific theme (eg one contains roads and the other railnetwork network) then I would keep the conformance report refering to the implementing rules with Not pass. In this case there are actually harmonized data missing for the specific theme.
For countries providing only one MD for harmonised AND as-is data:
How to indicate which URL links to the harmonised an which to the as-is dataset/distribution? Do we need this indication?
I would not allow this. Since the datasets are probably very diffent in content, structure and quality. I think they should be handled as two separate datasets.
The Commission stated that an as-is data set and a data set available in line with the
interoperability rules (under harmonized conditions) are instances or representations of the
same data set, not two data sets. There should be one metadata record saying there are two
versions of the data set: the harmonised one and the as-is one (original). We need to further
discuss how this should be appropriately documented in metadata to make sure data sets are
correctly accounted for without negative impact on the relevant calculated indicator. The
Chair proposed to further discuss this also involving the experts from the permanent technical
subgroup of the MIG (MIG-T).
(Summary report of the 10th INSPIRE MIG expert group meeting, 20 June 2019, Brussels; page 3)
Quick comment eventhough I was not actively part of the discussion.
If we enforce that notion of ‘as-is’ data, why writing in the discussed amended regulation No 1089/2010 - Article 4 : “When exchanging spatial objects, Member States shall comply with the definitions and constraints set out in the Annexes and provide values for all attributes and association roles set out for the relevant spatial object types and data types in the Annexes” ?
-> if those datasets appear in Metadata catalogues we are actually exchanging spatial objects no ?
-> I feel a contradiction here
The only scenario where I would understand this (note : morning comment thus with caffeine level is low) is when people put in their catalogues MD about FeatureType not falling under INSPIRE regulation.
On the rest:
- +1 on @MarieLambois, when she write : ‘From an INSPIRE point of view,I am still not convinced that we really need to implemnt this link in a standardized way. As far as only the INSPIRE MD goes to the European portal (>indicators are OK) …’
- +1 on part of this issue being about 2 representations of the same datasets -> thus +1 @KathiSchleidt reference to profiles discussions
From the MIG-T meeting, these three scenarios were listed:
- One data set => one metadata in the national CSW, aggregating different distribution formats (i.e. in one MD, "as-is" and INSPIRE harmonized)
- One dataset => two metadata in the national CSW, each one dedicated to a specific distribution format of the data (for each, "as-is" or INSPIRE harmonized)
- One dataset => still two metadata in the national CSW, but collected with a specific OGC filter during the harvest by the INSPIRE Geoportal
The Danish way:
- Two data sets => two metadata in the national CSW but only one is collected with a specific OGC filter during the harvest by the INSPIRE Geoportal
The OGC filter we use in the harvesting:
<ogc:PropertyIsEqualTo>
<ogc:PropertyName>Subject</ogc:PropertyName>
<ogc:Literal>INSPIRE</ogc:Literal>
</ogc:PropertyIsEqualTo>
We consider the national (as-is) data set and the "corresponding" INSPIRE data set as two separate data sets that needs to be documented by their own metadata records.
The two data sets (as-is and INSPIRE) is so different in structure and content that even dough they represent the same real world object they conceptual is treated as different data sets.
I can give examples for a Danish as-is data set: Cadastral parcel and boundary containing many properties/attributes, relations to other feature types etc. that is not part of the INSPIRE CP model. Many properties can not be mapped to the INSPIRE schema.
And vice versa - an as-is data set that is very simple - for instance a surface with a limited number of attributes is in the INSPIRE version much more complex (if it has a Geographical name...) and so forth. There are cases where the harmonized version is provided much other information from code lists etc. that can not be found in the as-is version.
The Danish way of metadata for two data sets 1) as-is and 2) INSPIRE harmonizes looks like:
In the end, it is the data providers that decide on how to organize their data into datasets.
In spain we have only one catalogue with all the oficial INSPIRE datasets and Servicies: The Catalogue CODSI.
Then we have another catalogue with all the geospatial information of all nodes of the Spatial Data Infraestructure of Spain (harmonized and not harmnised metadata).
So, we don't need to do any filter to do the harvesting to the INSPIRE Geoportal since we want to publish everything we have in CODSI.
Nevertheless we think it can be useful to introduce a keyword in the metadata files in order to harvest the harmonised datasets in case is needed.
The issue has been discussed in depth and the impression is that it doesn't affect the approach for the data-service linking simplification (in the sense that either in case of one dataset or in case of two datasets, the simplification approach (as defined in the final proposal of the good practice) is however applicable to both cases).
The discussion could continue with reference to the TGs metadata, as the issue could affect on the requirements and recommendations defined there.
Consequently the issue can be closed.