/research-profile

Research-profile of dataset metadata standards

Apache License 2.0Apache-2.0

Metadata profiles for research dataset discovery and assessment

Adapted from content originally prepared through the W3C DXWG

Generic metadata such as DCAT and schema.org is used in indexes to support dataset discovery. Basic DCAT supports the indication of dataset semantics through

  • dcat:keyword - which has a literal value
  • dcat:theme - which uses a skos:Concept from a controlled vocabulary
  • dct:subject - intended to be used with a link to a value in a controlled vocabulary

The semantics of these properties are relatively non-specific, particularly in relation to the properties associated with scientific observations that usually involve a well-defined protocol or sensor. Additional predicates (or specializations of the generic predicates) might be used to provide more explicit descriptions of datasets, to support cross-domain research data discovery and initial assessment.

Suitable additional properties typically correspond with slots within more specialized generic- and domain-specific (metadata) standards such as schema.org, W3C SSN, DDI/DISCO, DATS, ISO 19115 (Geospatial). Information should be copied up from the specialized metadata records to support description using a x-domain research data description standard.

Related work

Candidate properties

Vocabulary prefix properties
W3C SSN ssn: sosa: ssn-ext: sosa:observedProperty sosa:hasFeatureOfInterest sosa:usedProcedure sosa:madeBySensor sosa:phenomenonTime ssne:hasUltimateFeatureOfInterest
W3C Data Cube qb: qb:concept qb:componentProperty (also sub-properties qb:dimension qb:measure qb:attribute
DDI DISCO disco: disco:variable disco:universe disco:analysisUnit
DATS dats: dats:isAbout dats:producedBy dats:types dats:dimensions
schema.org s: s:variableMeasured s:measurementTechnique s:instrument s:usesDevice s:object
ISO 19115 iso19115: iso19115:extent iso19115:rangeElementDescription iso19115:instrument iso19115:acquisitionInformation

Approximate equivalences

  • ssn-ext:hasUltimateFeatureOfInterest ~ disco:universe ~ iso19115:extent
  • sosa:hasFeatureOfInterest ~ disco:analysisUnit ~ dats:isAbout ~ s:object
  • sosa:observedProperty ~ qb:concept ~ disco:variable ~ dats:dimensions ~ iso19115:rangeElementDescription~ s:variableMeasured
  • sosa:madeBySensor ~ iso19115:instrument ~ s:instrument (also s:usesDevice for medical applications)
  • sosa:usedProcedure ~ dats:producedBy ~ iso19115:acquisitionInformation ~ s:measurementTechnique

Metadata for 'rectangular' datasets

Starting points

DCAT augmented for research data

DCAT research profile

In this strawman the DCAT Dataset description is augmented with additional properties and classes, taken from W3C RDF vocabularies where available. The set of additional properties were identified in the workshop on Interoperability of Metadata Standards in Cross-Domain Science, Health, and Social Science Applications based on requirements from the three pilot projects of the CODATA Data Integration Initiative.

res: is a new namespace for research dataset metadata

schema.org selected for research data

schema.org research profile

The properties shown in this strawman are associated with the schema.org classes indicated, as listed at s:Dataset, s:Action. Under schema.org semantics there is no limitation on use of properties to describe any Type, but this diagram indicates the normal expectations, as given by the s:domainIncludes and s:rangeIncludes descriptors for the properties shown.

s:Action is a potential super-class for act-of-observation. It is introduced here to show some useful properties of research data that are not normally associated with "continuant" entities, but are likely to be useful in research dataset discovery. (Would s:AchieveAction or s:CreateAction be better?)

Other concerns

from DDI

  • collectionFrequency (different to accrualPeriodicity)
  • summary statistics ...
  • questionaire is a kind of sensor or instrument

from Geospatial

  • spatial resolution
  • spatial aggregation level

Other important sources to correlate

  • THREDDS/netCDF