/ogc-collection

Experiment to make a 'Collection' building block that can be used as a static cloud file.

Apache License 2.0Apache-2.0

OGC Collection and Record Building Blocks

This repository provides two experimental specifications. This experiment has a few related things going on:

  • Trial smaller 'building block' specifications that are more accessible than the standard OGC format, surfacing key information from the core specs in smaller packages.
  • Cleanly define core OGC content as JSON that is not tied to an API, so it can be used as a static resource on a cloud bucket that other API's can crawl & ingest.
  • Help provide clarity on how STAC fits in with OARecords.

These points will be expanded on below. The repository contains four related 'building block specifications'. The first two are fairly pure 'extractions', in that they just pull our the key JSON fields from more extensive OGC API standard requirements, to make clear that they can be used as JSON responses or even standalone files on the cloud.

The first is the 'OGC Collection' which extracts the core JSON content of a Collection from the OGC API, where it was originally specified as part of Feature, and is also in Common. STAC's Collection is also compatible, sharing most of the core fields.

The second is the 'OGC Record GeoJSON', which just extracts the core content out of OARec (see this link and scroll down to table 8 - it doesn't seem to have a direct link), that could be used statically, or could go straight into a WFS with no additional API capabilities.

These are both fairly 'thin', in that they have few required fields, and are meant to be flexible containers that most any OGC resource can fit into.

The more interesting pair of specs in this repository extend the first two specifications with more required fields, to be able to fully describe a 'dataset'. This has recently emerged as a key missing piece in the current OGC API ecosystem. It aims to describe a 'dataset' or a 'layer', the same level of abstraction as CSW ISO-19115, with information on the publisher, license, full title and descriptions, keywords, etc.

The first of the pair is the 'OGC Dataset Collection', which extends the OGC Collection with the key dataset fields. These are mostly extracted from the Record specification, but adapted to extend the OGC Collection (ie not trying to be GeoJSON).

The second is the 'OGC Dataset Record GeoJSON', which extends the OGC Record GeoJSON with more required fields, and maps directly to the OGC Dataset Collection.

Both pairs of specs can be seen as direct mappings to one another. For each pair one is the JSON that fits with the OGC API Collection, and one is the equivalent GeoJSON, which can be returned from an OGC API - Features or OGC API - Records.

In this repo

  • OGC Collection Spec - Extraction of Collection JSON from OGC API to be a standalone file that can be used statically. Can represent any OGC API Resource.
  • OGC Record GeoJSON spec - Extraction the Record content model from OGC API - Records.
  • OGC Dataset Collection Spec - Extraction of Collection JSON from OGC API to be a standalone file that can be used statically. Can represent any OGC API Resource.
  • OGC Dataset Record GeoJSON spec - Extraction the Record content model from OGC API - Records.
  • Examples - Matched examples showing both Record and Collection instantiations of the same data.

More experiment information

Building Blocks

The core idea here is to provide 'small pieces, loosely coupled' that can be easily understood by any developer, not just those who are deep in OGC API's. They should be short, aiming for ~5 pages, no more than 10. They would not aim to provide all the 'normative' requirements for testing, but should refer to core standards that provide those. Initially these would target small bits of functionality that are already in OGC API standards. They could be small OpenAPI snippets like BBOX, CRS, CQL Geometry filters. Or it could be JSON content, like TileSetMetadata or the OGC Collection proposed here. There is also a space for 'OGC Patterns', non-geospatial OpenAPI components that are common in OGC API's, like paging, limit, datetime, etc.

These ideally will be quite easy to read, and to edit. Right now we're using markdown, but asciidoc likely makes more sense as that's the main thing used in OGC, so will likely convert these to asciidoc soon.

Static OGC ('cloud native')

One of the most interesting things to emerge from STAC is the 'static stac' idea, where STAC JSON can just sit on a cloud storage bucket. OGC has some really nice core JSON pieces, but they are tightly coupled to the API definitions. Little actually needs to be changed, but imiplementors should be able to find a document that clearly describes just the content that can be placed online.

STAC Items and Records

One question that has often come up is what is the relationship between STAC and OGC API Records? The two specs in this repo ideally help with an opinionated answer to that question. It's been hard to really talk about it without the appropriate primitives in place.

The proposed primitives in the 'search geospatial data' space are:

  • A 'minimal Collection', as defined in OGC API - Features, which is 2 required fields (id and links), and a handful of optional ones (extent, title, description, etc), as articulated in the OGC Collection Spec in this repo.
  • A 'Dataset Collection', as the OGC Dataset Collection Spec in this repo describes. This aims to provide enough information to enable 'collection search'.
  • An STAC Item, which is part of a Collection, that links to some type of geospatial information. A STAC Item is a GeoJSON feature.

STAC provides both Collection and STAC Items, as well as an API to search STAC Items. In STAC there is a clear relationship, where each Item is a member of a Collection. To make it concrete, 'Landsat 8' is a collection, while a particular landsat scene is an 'Item'.

The STAC API provides 'Item search', letting a user include one or more collections in a search (by area, time and/or properties), returning all STAC Items that match it.

It does not have 'Collection search', that would let a user search for collections by area, time and/or properties. This would let them find 'NAIP' in the US, or an aerial imagery collection in the Netherlands, or a nice global dataset like Sentinel 2. The geometry of these is the area of the entire collection. THis is a feature that many STAC users are asking for.

The STAC core team has been waiting for OARecords API to define that collection level search. But the needs are actually less than a fully defined new API. What STAC actually needs is mostly the OGC Records GeoJSON spec - how to represent a 'Collection' as GeoJSON that can go either directly into the STAC Features API as a special layer, or is defined as a particular endpoint, but that works with the exact same Features API mechanics as STAC Item Search (cross collection) or individual collection/items endpoints in STAC's Featuers API's. STAC implementations could choose to use the sort extension, aligning STAC with OGC, and the Filter (CQL) extension.

For those with deep OGC background, what we're proposing is that STAC API should serve the same role as the ebRIM CSW did in the past, enabling search of millions of scenes/granules. And then Collection Search (Records API) should play the role as CSW iso19115 - primarily looking for 'layers', where you'd expect someone to fill out metadata. These can be imagery collections, but also vector information like 'building footprints' or 'roads'. These would ideally link to a Features API. But the standalone OGC Dataset Collection Spec opens up the potential for 'static collections', where a Catalog crawls geopackages with Collection JSON sidecars, ingesting them and turning them into Record GeoJSON. And potentially even ingesting their source data into an OGC API - Features implementation.

One further question often posed is where n-dimensional data like netCDF and zarr fit in. For the most part those should be in the 'collection' level. It's made less sense to make them individual STAC Items when it's a big dataset with lots of information. But it depends on how the files are used. One could see a series of netCDF's that are all ocean temperatures with different time and space ranges, but that fit together into an overall 'collection', and in that case you'd want to use STAC Items to find the individual pieces. Similarly zarr can hold huge amounts of data, so you could have STAC Items that point into various pieces of it. It is more that with STAC you want to search for little 'parts' of the entire collection. But if you're representing the entire collection as one file then it likely makes less sense to put that in STAC.

So in conclusion, there is a space for both STAC and OARec. STAC should focus on Item level search: scenes/granules/point cloud captures/etc. OARec should focus on 'dataset' search.