radiantearth/stac-spec

absolute/relative references

Closed this issue · 13 comments

See #160 and #56 for related discussion.

Definitions with examples:

  • absolute url
    • https://cbers-stac-0 6.s3.amazonaws.com/CBERS4/MUX/066/096/CBERS_4_MUX_20181024_066_096_L2.json
  • absolute path reference: /CBERS4/MUX/066/096/CBERS_4_MUX_20181024_066_096_L2.json
    • starts with / character
    • No use of ../
  • relative path reference: 066/096/CBERS_4_MUX_20181024_066_096_L2.json, ../catalog.json
    • does not start with / character

Pros/Cons:

  • absolute url
    • PRO: Possibility to directly get web address from STAC file.
    • PRO: No parse needed to obtain absolute link.
    • CON: Requires data to be online.
    • CON: Requires definition of site and protocol when creating STAC file.
    • CON: Requires global definition of the files organization structure.
    • CON: All STAC files need to be updated if site or organization structure is changed.
      • That includes the case where the STAC set or subset is copied to other location, if we assume that it is desired for the references to link to data in the same 'location'.
  • absolute path
    • Simple parse needed to obtain absolute link when browsing (prefix with site)
    • PRO: Does not require data to be online.
    • PRO: Does not require definition of site and protocol when creating STAC file.
    • CON: Requires global definition of the files organization structure.
    • CON: All STAC files need to be updated if organization structure is changed.
      • That includes the case where a STAC subset is copied to other location, if we assume that it is desired for the references to link to data in the same 'location'.
    • CON: Given only the STAC file its web address need to be obtained from other mechanism.
  • relative path
    • CON: Parse needed to obtain absolute link when browsing.
      • Deal with multiple ../, it's possible that distinct relative paths point to the same file.
    • PRO: Does not require data to be online.
    • PRO: Does not require definition of site and protocol when creating STAC file.
    • PRO: Requires local definition of the files organization structure.
    • PRO: Only some STAC files need to be updated if local organization structure is changed.
    • CON: Given only the STAC file its web address need to be obtained from other mechanism.
    • PRO: Supports set/subset copies, breaking only the links to elements not copied.

I'd say that if I had to choose one model I'd go with absolute path, with a self reference optionally including an absolute url (permalink) if this information is available when creating the STAC files.

Update 02/04/2019

My feeling from the meeting is that any constraint on the reference types will have impact in some relevant use case. Maybe in that scenario we should make it flexible, even if that increases complexity on the client.

For instance, allow any kind of reference and only enforcing absolute URL (preferable) or absolute path for self, making self optional to deal with the case when there is no way to obtain the absolute reference/URL.

Update 02/16/2019

My current proposal: recommend relative links for everything except self. This supports the 'copy set/subset' use case described above. Since we want to support data that is not going online we need to change the self definition: Possible alternatives:

  • Making it optional and keep the absolute url requirement
  • Keeping it required but allowing absolute references for the 'not online' use case

My current proposal: recommend relative links for everything except self. This supports the 'copy set/subset' use case described above. Since we want to support data that is not going online we need to change the self definition: Possible alternatives:

Yes, I agree and I think relative is the most universal and useful solution.

  • Making it optional and keep the absolute url requirement
  • Keeping it required but allowing absolute references for the 'not online' use case

I'd say "Make it optional and require either absolute path or absolute url". Here's why:

  • First use-case: Static catalog (NOT online) -> absolute path (but not sure whether it is helpful for anything)
  • Second use case: Static catalog (online) -> absolute url sounds very helpful
  • Third use case (biased by my openEO view): Collection only API such as GEE or openEO (discovery), or full catalogs with items that can be downloaded from a protected (non-permanent) user workspace (openEO results): self link not required or not available.

I'm leaning towards relative links as well, with a few caveats to explore. In particular I'd like to consider the API use case, and if we can just require absolute links there. And if we explicitly call out an 'online static catalog' that does have absolute URL's, though perhaps we just put them at the 'collection' level instead of in an item.

I do definitely lean towards 'make it optional' for self link - I think a self link with a relative path is not useful, so let's just leave it off.

I can't quite follow what you are describing for the API use case. Why does the API use case require an absolute URL?

I'd say "Make it optional and require either absolute path or absolute url". Here's why:

  • First use-case: Static catalog (NOT online) -> absolute path (but not sure whether it is helpful for anything)
  • Second use case: Static catalog (online) -> absolute url sounds very helpful
  • Third use case (biased by my openEO view): Collection only API such as GEE or openEO (discovery), or full catalogs with items that can be downloaded from a protected (non-permanent) user workspace (openEO results): self link not required or not available.

I'm OK with that.

@cholmes I'm currently exposing the static catalog through API and it would be better to use the same document for both static and API. In that case forcing absolute urls for API would result in also using absolute urls for static.

@fredliporace Couldn't you simply add the link in the API? Or isn't that going through some server-side processing?

Well, while working on this answer I guess I understood @cholmes 's concern.
My current development implementation for the API has the following address, which will be changed for production:

https://4jp7f1hqlj.execute-api.us-east-1.amazonaws.com/prod/stac/search/

A sample of current returned data is:

{
  "type": "FeatureCollection",
  "features": [
    {
      "id": "CBERS_4_MUX_20190203_162_119_L4",
      "type": "Feature",
      "geometry": {
        "type": "MultiPolygon",
        "coordinates": [
          [
            [
              [
                -52.981499,
                -17.447696
              ],
              [
                -51.850932,
                -17.618832
              ],
              [
                -51.60912,
                -16.556977
              ],
              [
                -52.733004,
                -16.386859
              ],
              [
                -52.981499,
                -17.447696
              ]
            ]
          ]
        ]
      },
      "bbox": [
        -52.986338,
        -17.61936,
        -51.606218,
        -16.371447
      ],
      "properties": {
        "datetime": "2019-02-03T13:25:31Z",
        "eo:sun_azimuth": 94.5659,
        "eo:sun_elevation": 57.8094,
        "eo:off_nadir": -0.00913168,
        "eo:epsg": 32751,
        "cbers:data_type": "L4",
        "cbers:path": 162,
        "cbers:row": 119
      },
      "links": [
        {
          "rel": "self",
          "href": "https://cbers-stac-0-6.s3.amazonaws.com/CBERS4/MUX/162/119/CBERS_4_MUX_20190203_162_119_L4.json"
        },
        {
          "rel": "parent",
          "href": "https://cbers-stac-0-6.s3.amazonaws.com/CBERS4/MUX/162/catalog.json"
        },
        {
          "rel": "collection",
          "href": "https://cbers-stac-0-6.s3.amazonaws.com/collections/CBERS_4_MUX_collection.json"
        }
      ],
      "assets": {
        "thumbnail": {
          "href": "https://s3.amazonaws.com/cbers-meta-pds/CBERS4/MUX/162/119/CBERS_4_MUX_20190203_162_119_L4/CBERS_4_MUX_20190203_162_119.jpg",
          "type": "image/jpeg"
        },
        "metadata": {
          "href": "s3://cbers-pds/CBERS4/MUX/162/119/CBERS_4_MUX_20190203_162_119_L4/CBERS_4_MUX_20190203_162_119_L4_BAND6.xml",
          "title": "INPE original metadata",
          "type": "text/xml"
        },
        "B5": {
          "href": "s3://cbers-pds/CBERS4/MUX/162/119/CBERS_4_MUX_20190203_162_119_L4/CBERS_4_MUX_20190203_162_119_L4_BAND5.tif",
          "type": "image/vnd.stac.geotiff; cloud-optimized=true",
          "eo:bands": [
            0
          ]
        },
        "B6": {
          "href": "s3://cbers-pds/CBERS4/MUX/162/119/CBERS_4_MUX_20190203_162_119_L4/CBERS_4_MUX_20190203_162_119_L4_BAND6.tif",
          "type": "image/vnd.stac.geotiff; cloud-optimized=true",
          "eo:bands": [
            1
          ]
        },
        "B7": {
          "href": "s3://cbers-pds/CBERS4/MUX/162/119/CBERS_4_MUX_20190203_162_119_L4/CBERS_4_MUX_20190203_162_119_L4_BAND7.tif",
          "type": "image/vnd.stac.geotiff; cloud-optimized=true",
          "eo:bands": [
            2
          ]
        },
        "B8": {
          "href": "s3://cbers-pds/CBERS4/MUX/162/119/CBERS_4_MUX_20190203_162_119_L4/CBERS_4_MUX_20190203_162_119_L4_BAND8.tif",
          "type": "image/vnd.stac.geotiff; cloud-optimized=true",
          "eo:bands": [
            3
          ]
        }
      }
    }
  ]
}

I'm currently using absolute links. If I were using relative links it would not be possible to follow the resulting links directly. The browser would have to build the link based on 'self' and then applying the relative information.

I'm documenting this use case in #401.

Why would we not be able to follow them? A client just needs to resolve the relative links against the URL it requested, I guess. Of course, if links point to another server or so, you'd need absolute. So we must allow absolute url + relative url for all links, except self, which is optional and must be an absolute url (and maybe absolute path, but still not sure how useful that is). In the end it seems we are just allowing what the WWW/HTML allows, which works for ages. ;)

In that case the requested URL would be

https://4jp7f1hqlj.execute-api.us-east-1.amazonaws.com/prod/stac/search/

and the parent relative link would be something like

../catalog.json

so that would not be a simple concatenation of requested URL and the relative link. This kind of resolution works well for static pages, but not quite if you use something like stac search api.

Well, I expected that an API would generate meaningful relative URLs and not just pass through the URLs from the static catalog. Passing the URLs through will never work with relative links in the API, of course.

@fredliporace - I don't think using the same document for static and API is the way to go. What I'm leaning towards is that 'self-contained catalogs' are static ones that follow a recommendation to have all relative links. And then I'm thinking of even going so far as to try to require that STAC API's return absolute self links. Like @m-mohr I expect an API to generate its URL's. And to most always have the API url's be absolute.

Indeed I think there could be a recommendation that an API that is powered by a static API would use a rel link to point back to the place the static catalog lives. Perhaps even use rel=canonical, to say that 'this is the core location that this item lives at'.

A few weeks back, I'd vaguely proposed a sidecar that would facilitate using relative URLs for all links (including self), including when a catalog has been copied from its original location. Originally, I was thinking that it would need to exist for each sub-catalog and that both parent catalogs and associated sidecars would need to be read/navigated in order to resolve a URL for a child or item.

However, since we're encouraging publishers to include both parent and root rels, a minor addition to the root link will allow us to resolve self with only a single sidecar file (and read; no need to navigate or read parent catalogs). If we include the reverse link (i.e. if root's href is ../catalog.json, the reverse might be 12/catalog.json), we know the path from the root (which would have a sidecar file consisting of the absolute URL to it) to the sub-catalog/item and can produce an absolute URL easily.

HTML's rev attribute appears to describe this pattern (but has been dropped in HTML5). Potential ways to describe this could be:

{
  "rel": "root",
  "href": "../catalog.json"
},
{
  "rev": "root",
  "href": "12/catalog.json"
}

Alternately, something like (from doesn't feel quite right, but the idea is that it would be an additional attribute within the link):

{
  "rel": "root",
  "href": "../catalog.json",
  "from": "12/catalog.json"
}

#414 closes this, with some additional color on it coming in #428