OpenGeoMetadata/shared-repository

Description of resources available to a layer

Closed this issue · 17 comments

This is a proposal to add an optional file type to OpenGeoMetadata record directories. This file could be named resources.json and would denote online resource locations for various web services. A proposed schema similar to how GeoBlacklight schema defines them using dct:references is proposed where the OGC CatInterop table could be used as a references for defining available resources.

This would be a way to provide various online resources for a give record.

An example resources.json:

{
  "http://schema.org/url":"http://purl.stanford.edu/bd427dr8948",
  "http://www.loc.gov/mods/v3":"http://purl.stanford.edu/bd427dr8948.mods",
  "http://www.isotc211.org/schemas/2005/gmd/":"http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:bd427dr8948/iso19139.xml",
  "http://www.w3.org/1999/xhtml":"http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:bd427dr8948/default.html",
  "http://www.opengis.net/def/serviceType/ogc/wfs":"https://geowebservices-restricted.stanford.edu/geoserver/wfs",
  "http://www.opengis.net/def/serviceType/ogc/wms":"https://geowebservices-restricted.stanford.edu/geoserver/wms"
}

Resources added could be things like:

  • WMS Services
  • WFS Services
  • Direct file download
  • Metadata viewing/downloading
  • ArcGIS Server Services
  • Preview Images

See the Cat Interop list for more ideas

I like the impulse here, which is I assume to provide a simple of list of associated resources in an agreed upon form to avoid parsing multiple documents in different formats to scrounge for links.

I’m less sure about the implementation. Here are a few thoughts/questions:

What if there are multiple resources of a particular type? You could default to a single value and roll over to an array if there are multiple values, but I think this will trip up a lot of json libraries. There could easily be mulitiple html, pdf, or even preview image urls for a given layer, which may or may not have the same function. (e.g. pdf preview or resource vs. pdf metadata)

"http://www.w3.org/1999/xhtml":"http://opengeometadata.stanford.edu/metadata/edu.stanford.purl/druid:bd427dr8948/default.html",

In the above case, what is the URI really telling me about the resource type? I can infer, as a human reader, that this is a metadata link given the link value, but it could be a link to a download page or a preview page or a library catalog page. I think some semantic indication would be useful where there is ambiguity. Given that this is a json resource, my expectation is that it could be handled by a machine.
For the OGC services, a layer name (feature type name) is really necessary to do anything with this resource. Virtually everything else can be gleaned by querying the capabilities document (if it’s actually available). Maybe something like the following would be more useful?

"http://www.opengis.net/def/serviceType/ogc/wfs":"https://geowebservices-restricted.stanford.edu/geoserver/wfs?typeNames=namespace:mylayername",
"http://www.opengis.net/def/serviceType/ogc/wms":"https://geowebservices-restricted.stanford.edu/geoserver/wms?layers=namespace:mylayername"
I dislike that this is not a valid query to a resource, but neither is the initial proposal.

Putting the resource name in a separate field creates other problems/complexities, since it is possible that feature services and tile services might have different names. =

What about something like this for the namespace and layername?

{
  "http://www.opengis.net/def/serviceType/ogc/wfs":"https://geowebservices-restricted.stanford.edu/geoserver/wfs",
  "http://www.opengis.net/def/serviceType/ogc/wfs#namespace":"layerNamespace",
  "http://www.opengis.net/def/serviceType/ogc/wfs#layerName":"layerName"
}

that works!

edit: I have mixed feelings about the namespace being separate from the layername, mostly because the namespace is a GeoServer specific thing, rather than an OGC specification. Otherwise it is a reasonable solution.

On Jan 13, 2015, at 9:07 AM, Jack Reed notifications@github.com wrote:

What about something like this for the namespace and layername?

{
"http://www.opengis.net/def/serviceType/ogc/wfs":"https://geowebservices-restricted.stanford.edu/geoserver/wfs",
"http://www.opengis.net/def/serviceType/ogc/wfs#namespace":"layerNamespace",
"http://www.opengis.net/def/serviceType/ogc/wfs#layerName":"layerName"
}

Reply to this email directly or view it on GitHub #13 (comment).

I've been doing some thinking about converting ISO 19139 and FGDC directly to GeoBlacklight schema (right now at Stanford, we go from MODS to GeoBlacklight). We have GeoCombine which is an initial stab at doing this.

The issue is that the references (discussed above) is not the only metadata that we need for the GeoBlacklight schema, and that is not included in the ISO or FGDC metadata.

We have the following required schema elements in GeoBlacklight that need to be present but are not extractable from the ISO/FGDC:

  • dc:identifier: unique identifier (FGDC only)
  • dct:references: discussed above
  • layer:geom_type: such as Polygon, Line, Raster, etc.
  • layer:id: this is the WxS layer identifier
  • layer:slug: this is the unique slug for bookmarkable URLs in GeoBlacklight
  • dc:rights: either Restricted or Public
  • dc:format: the file format like Shapefile, GeoTIFF, etc. (optional)

So, I'm proposing an layer.json metadata file in each layer that has these metadata.

The outcome would be that with the iso19139.xml or fgdc.xml and the layer.json metadata, we could generate the geoblacklight.json. Thus, enabling direct harvesting from OpenGeoMetadata into GeoBlacklight.

I'm not sure whether there is additional metadata required by OGP that would be a fit for layer.json. I know the OGP Metadata group is working on best practices for encoding these missing metadata for web services, and whatnot. And OGP schema has the full FGDC metadata included so I'm assuming the OGP schema is largely extracted from it.

Also, the format need not be json, it could be RDF or XML -- I think JSON is the easiest to work with implementation-wise.

I can code up some examples if this sounds like a workable approach...

Also, Kim and I have a metadata profile that we're using at Stanford for fields that are required by either the ISO standard, or by GeoBlacklight (coded as "SDI" in the table).

screen shot 2015-02-19 at 11 03 11 am

Darren,

These are essentially the same requirements that we have for OGP. I think that JSON is great if an external file is required.

A few points/questions:
The web services layer identifier may or may not be the same for all web services. It likely is, but it is not required to be.
geom_type should be extractable from FGDC; vector vs. raster should be extractable from ISO
layer:slug — how are you handling metadata that is imported that has no permanent url?
format — is this multi-valued, then? should formats be associated with references?
how do you/we handle metadata when there is no layers.json present?
I would suggest also adding an institution/organization source field

We have the following required schema elements in GeoBlacklight that need to be present but are not extractable from the ISO/FGDC:

dc:identifier: unique identifier (FGDC only)
dct:references: discussed above
layer:geom_type: such as Polygon, Line, Raster, etc.
layer:id: this is the WxS layer identifier
layer:slug: this is the unique slug for bookmarkable URLs in GeoBlacklight
dc:rights: either Restricted or Public
dc:format: the file format like Shapefile, GeoTIFF, etc. (optional)

On Feb 19, 2015, at 2:01 PM, Darren Hardy notifications@github.com wrote:

I've been doing some thinking about converting ISO 19139 and FGDC directly to GeoBlacklight schema (right now at Stanford, we go from MODS to GeoBlacklight). We have GeoCombine which is an initial stab at doing this.

The issue is that the references (discussed above) is not the only metadata that we need for the GeoBlacklight schema, and that is not included in the ISO or FGDC metadata.

We have the following required schema elements in GeoBlacklight that need to be present but are not extractable from the ISO/FGDC:

dc:identifier: unique identifier (FGDC only)
dct:references: discussed above
layer:geom_type: such as Polygon, Line, Raster, etc.
layer:id: this is the WxS layer identifier
layer:slug: this is the unique slug for bookmarkable URLs in GeoBlacklight
dc:rights: either Restricted or Public
dc:format: the file format like Shapefile, GeoTIFF, etc. (optional)
So, I'm proposing an layer.json metadata file in each layer that has these metadata.

The outcome would be that with the iso19139.xml or fgdc.xml and the layer.json metadata, we could generate the geoblacklight.json. Thus, enabling direct harvesting from OpenGeoMetadata into GeoBlacklight.

I'm not sure whether there is additional metadata required by OGP that would be a fit for layer.json. I know the OGP Metadata group is working on best practices for encoding these missing metadata for web services, and whatnot. And OGP schema has the full FGDC metadata included so I'm assuming the OGP schema is largely extracted from it.

Also, the format need not be json, it could be RDF or XML -- I think JSON is the easiest to work with implementation-wise.

I can code up some examples if this sounds like a workable approach...


Reply to this email directly or view it on GitHub #13 (comment).

Thanks, Chris.

So, WMS might have a different layer id than WFS? Jeez :). We don't handle that case currently in GeoBlacklight -- just a single layer:id across all WxS services. Not sure how to handle the representation of this because dct:references is meant to map URIs to URLs only, not URIs to values.

geom_type is Polygon, Line, Point, or Raster. We don't have "Vector". In our metadata workflow, we've had to use OGR to detect the geom_type directly from the data. The ISO metadata didn't have it. @kimdurante? We use the geom_type for a categorical facet so it's basically an enumerated type. See https://github.com/geoblacklight/geoblacklight-schema/blob/6592a242a5af5ad1d8e1b96c97cc916ba4c47398/geoblacklight-schema.json#L70

The layer:slug is used by GeoBlacklight to provide a single URL to the GeoBlacklight page. Our slugs use our UUIDs, like so:

https://earthworks.stanford.edu/catalog/stanford-ff128mp5307

but the slug could be "stanford-area-names-dari-mazar-i-sharif" or something based on a title. The caveat is that the slugs must be unique across a GeoBlacklight instance.

Good point about dct:provenance (Institution). We hardcode ours to Stanford in our XSLT. (see https://github.com/geoblacklight/geoblacklight-schema/blob/master/lib/xslt/iso2mods.xsl#L863)

See this comment #13 (comment)

Would something like that approach help with the different layerids?

Good catch on the institution Chris! I will add it to the table.

Yeah, there is no explicit reference to geometry type in ISO, just vector or raster.
Chris is correct that FGDC does provide these distinctions, and I think I got them correct here:
https://github.com/geoblacklight/geoblacklight-schema/blob/master/lib/xslt/fgdc2geoBL.xsl

Yesterday, I spoke with @drh-stanford about generating the format element. In MODS, we generally use MIME types for these, so that's what is currently in the stylesheets. I will remove these....
In ISO, format designations are not always included - ArcCatalog auto-generates this field but users of something like GeoNetwork would have to add this field manually and often it is left blank. So, auto-generating these using a standard set of format terms might be a good choice.

For the institution element, each of the FGDC and ISO to GeoBlacklight transforms currently contain a variable which tries to assign an institution based on certain metadata fields. This is an adventure in free-text field mapping, so if hardcoding these is an option, that would be great. Otherwise, we would have to add conditions for each provider in the appropriate transform.

Hi Darren,

So, WMS might have a different layer id than WFS? Jeez :). We don't handle that case currently in GeoBlacklight -- just a single layer:id across all WxS services. Not sure how to handle the representation of this because dct:references is meant to map URIs to URLs only, not URIs to values.

Some use cases: WMS and WFS with different layer ids would be very unusual (though, what about a WMS and WFS with different ‘workspace’ names for workspace base security), but a WMS, WCS pair might be more common to have on separate servers as there are lots of specialized WCS implementations. Harvard has different layer “names” for their tile cache and WMS services; one that includes a database prefix and another that does not. An institution might provide an ArcGIS REST server and a GeoServer for OGC services.

I think it’s something to consider. Even if GeoBlacklight doesn’t currently support it, it seems wise to cover your bases in providing a standard for a public repository.

geom_type is Polygon, Line, Point, or Raster. We don't have "Vector". In our metadata workflow, we've had to use OGR to detect the geom_type directly from the data. The ISO metadata didn't have it. @kimdurante https://github.com/kimdurante? We use the geom_type for a categorical facet so it's basically an enumerated type. See https://github.com/geoblacklight/geoblacklight-schema/blob/6592a242a5af5ad1d8e1b96c97cc916ba4c47398/geoblacklight-schema.json#L70 https://github.com/geoblacklight/geoblacklight-schema/blob/6592a242a5af5ad1d8e1b96c97cc916ba4c47398/geoblacklight-schema.json#L70Our enum is the same. We also don’t support an unspecified “vector” type, though I wonder if we should, just for this reason.

The layer:slug is used by GeoBlacklight to provide a single URL to the GeoBlacklight page. Our slugs use our UUIDs, like so:

https://earthworks.stanford.edu/catalog/stanford-ff128mp5307 https://earthworks.stanford.edu/catalog/stanford-ff128mp5307
but the slug could be "stanford-area-names-dari-mazar-i-sharif" or something based on a title. The caveat is that the slugs must be unique across a GeoBlacklight instance.

I like this, and would like to include this functionality in OGP at some point. However, my question had to do with harvesting metadata for a “layer” that doesn’t have a slug or purl.
Good point about dct:provenance (Institution). We hardcode ours to Stanford in our XSLT. (see https://github.com/geoblacklight/geoblacklight-schema/blob/master/lib/xslt/iso2mods.xsl#L863 https://github.com/geoblacklight/geoblacklight-schema/blob/master/lib/xslt/iso2mods.xsl#L863)


Reply to this email directly or view it on GitHub #13 (comment).

Yesterday, I spoke with @drh-stanford https://github.com/drh-stanford about generating the format element. In MODS, we generally use MIME types for these, so that's what is currently in the stylesheets. I will remove these....
In ISO, format designations are not always included - ArcCatalog auto-generates this field but users of something like GeoNetwork would have to add this field manually and often it is left blank. So, auto-generating these using a standard set of format terms might be a good choice.

Format seems tricky to me. Is this the format of the “original” data, or is this formats available for distribution. Even with just a GeoServer endpoint, I can potentially get KMZ, Shape-zip, pdf, random image, geojson, csv, etc.
For the institution element, each of the FGDC and ISO to GeoBlacklight transforms currently contain a variable which tries to assign an institution based on certain metadata fields. This is an adventure in free-text field mapping, so if hardcoding these is an option, that would be great. Otherwise, we would have to add conditions for each provider in the appropriate transform.

I think if meta-metadata is being generated anyway, it might as well contain the Institution.

I think this is a reasonable approach (see caveats in discussion), but not sure it helps re: the dct:resources issue mentioned by @drh-stanford

For harvesting a layer without a slug or purl, I dunno :). In GeoBlacklight, you can append .json to the slug to get the metadata, like so:

https://earthworks.stanford.edu/catalog/stanford-ff128mp5307.json

We don't as yet have a native bulk harvesting interface in GeoBlacklight. See geoblacklight/geoblacklight#48 for our ticket on this. Lately, we've been focusing on GeoCombine https://github.com/OpenGeoMetadata/GeoCombine as the ingester harvesting directly from OpenGeoMetadata repos rather than from webservices. Also, for our current EarthWorks implementation, I use these tools to download metadata from the OGP network, do some QA on them, transform them into GeoBlacklight, and upload them to our Solr instance:

https://github.com/geoblacklight/geoblacklight-schema/tree/master/tools/ogp

The dc:format is indeed the file format of the original data (i.e., as deposited in our repository). When we provide the WxS endpoint, the get capabilities should tell you the formats which you can download via GeoServer transformations.

As for the layer:id, we might be able to encode it as a JSON value in layer.json that provided a hash for the differing layer ids between the WxS services, or just a single layer id string if they are identical. This approach, at least, keeps it out of dct:references (which is supposed to be a URI -> external-URL mapping only).

Hi all,

I've been looking at geometry types in ISO for a different project, and after reading this thread wanted to run my finding past you.

From what I gather, it seems to me that ISO does have a place for specific vector geometry types. The MD_GeometricObjectTypeCode, located:

/gmd:MD_Metadata/gmd:spatialRepresentationInfo/gmd:MD_VectorSpatialRepresentation/gmd:geometricObjects/gmd:MD_GeometricObjects/gmd:geometricObjectType/gmd:MD_GeometricObjectTypeCode

References a code list where point, curve, and surface seem to fit the bill for point, line, and polygon.

<gmd:spatialRepresentationInfo>
  <gmd:MD_VectorSpatialRepresentation>
    <gmd:topologyLevel>
    <gmd:MD_TopologyLevelCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/codelist/ML_gmxCodelists.xml#MD_TopologyLevelCode" codeListValue="geometryOnly" codeSpace="ISOTC211/19115"/>
    </gmd:topologyLevel>
    <gmd:geometricObjects>
      <gmd:MD_GeometricObjects>
        <gmd:geometricObjectType>
          <gmd:MD_GeometricObjectTypeCode codeList="http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/resources/codelist/ML_gmxCodelists.xml#MD_GeometricObjectTypeCode" codeListValue="point" codeSpace="ISOTC211/19115"/>
        </gmd:geometricObjectType>
      </gmd:MD_GeometricObjects>
    </gmd:geometricObjects>
  </gmd:MD_VectorSpatialRepresentation>
</gmd:spatialRepresentationInfo>

Any thoughts on this? I'd like to know if I'm totally offbase and should not be using this for the particular workflow I'm hashing out at the moment.

I don't think you're off base here. These definitions seem to fit nicely. The issue we've had here is when these metadata are auto-generated - polygons and lines are sometimes classified as 'complex' or 'composite,' making it difficult to determine the geometry.

@gravesm what do you think of this approach? having a layer.json file that has this external references and other fields not present in ISO/FGDC...

I guess my feeling is that if I am going to generate this JSON, why not just go ahead and generate the rest of the document as JSON? And if I were going to do that, I would probably be inclined to look to JSON-LD.