developmentseed/cogeo-mosaic-tiler

Use with mission-specific tilers

Closed this issue · 3 comments

I know you currently have awspds-mosaic as a fork of cogeo-mosaic-tiler, but I think it's worth considering integrating support for mission-specific tilers inside cogeo-mosaic-tiler.

It would allow you to maintain a single repository and users who want to tile arbitrary COGs, CBERS, and Landsat, etc can use a single lambda function.

Thoughts:

There would need to be a way to determine from the mosaicJSON which rio-tiler tiling function to use, so that arbitrary URLs use the default tiler, landsat scene id's use the landsat tiler, cbers scenes use the cbers tiler, etc.

  • MosaicJSON top-level mission key: something like

     // Optional. A string indicating that the values of `tiles` represent scene ids of the designated mission instead of fully-qualified URLs.
     "mission": "landsat8"
    

    in which case every value of every quadkey is interpreted as a a scene id

  • Quadkey-level prefixes: in order to have the tile reader vary across tiles, you could have quadkey-level prefixes. The current spec is quite flexible about allowing either a URL or a scene id:

    // REQUIRED. A dictionary of per quadkey dataset in form of {quadkeys: [datasets]} pairs.
    // Keys MUST be valid quadkeys index with zoom level equal to mosaic `minzoom` (or `quadkey_zoom` if present).
    // Values MUST be arrays of strings (url or sceneid) pointing to a 
    // Cloud Optimized dataset with bounds intersecting with the quadkey bounds.
    "tiles": {
        "030130": [
            "s3://my-bucket/dir/file1.tif",
            "s3://my-bucket/dir/file2.tif",
        ]
    }
    

    A mosaic currently created with awspds-mosaic has sceneids of the form:

    "0231120": ["LC08_L1TP_029035_20160720_20180131_01_T1", "LC08_L1TP_029034_20160720_20180131_01_T1", "LC08_L1TP_029036_20130610_20170310_01_T1"]

    Other than attempting string matching against the scene ID, there's no way to know these correspond to Landsat scenes.

    Instead valid landsat scenes could be something like:

    "0231120": ["s3://landsat-pds/LC08_L1TP_029035_20160720_20180131_01_T1", "s3://landsat-pds/LC08_L1TP_029034_20160720_20180131_01_T1", "s3://landsat-pds/LC08_L1TP_029036_20130610_20170310_01_T1"]

    where any path starting with s3://landsat-pds/ is interpreted as a prefix for a scene id, and the rest of the url is interpreted as one.

    This could have issues, however, if someone ever specifies a fully qualified path starting with s3://landsat-pds all the way to a COG asset. To prevent that, you could use a URL scheme like landsat://<scene_id>, cbers://<scene_id etc

thanks for the issue @kylebarron,

It's true we started to work on mosaicjson with Landsat data at Devseed but we then realized they were a lot of other data that could need a simple solution for mosaic. As mention we wanted a really simple solution, avoiding custom edge cases was really important.

On the cogeo-mosaic-tiler side, I'd also love to keep it as simple as possible. Thus adding missions to the tiler seems kinda the opposite. Also worth noting that mission like landsat/cbers/sentinel are all hosted on different region. Creating only one endpoint for COG and missions won't be performant (and maybe costly because of inter-region data transfer).

MosaicJSON top-level mission key: something like
// Optional. A string indicating that the values of tiles represent scene ids of the designated mission instead of fully-qualified URLs.
"mission": "landsat8"

I'm 👎 on making change in mosaic-json. What if someone wants to use landsat8 on GCP, or different level L1C L2A. mosaic-json shouldn't be tied to rio-tiler specifics.

While I'm 👎 on making this kind of change, I think we could accommodate by making cogeo-mosaic-tiler more extensible (having mission specific pluging).

This will mostly means doing change in the application level where we will need to merge multiple lambda-proxy API object to a unique app.

from cogeo_mosaic_tiler.handlers import app as cogeo_app
from cogeo_mosaic_tiler_landsat.handlers import app as landsat_app

app = API()
app.add_routes(cogeo_app.routes)
app.add_routes(landsat_app.routes)

I've done this in the past but I'm not sure this is still possible within lambda-proxy

Self-describing MosaicJSON

I think my core complaint here is that the current MosaicJSON spec is not fully self-describing. If I give you a valid TileJSON, you will always be able to load the full extent of the map with no other information.

If you give me a current valid MosaicJSON, if the values of tiles are not fully-qualified URLs, I have no way to accurately parse the MosaicJSON without outside knowledge.

What if someone wants to use landsat8 on GCP, or different level L1C L2A. mosaic-json shouldn't be tied to rio-tiler specifics

I don't think "mission": "landsat8" ties values to rio-tiler specifics! It ties Scene IDs to an identifier that describes how to parse them! Is it not true that a landsat SCENE_ID is defined by NASA/USGS, and thus uniquely defines the image (not the path to the image!) no matter where it is hosted? Whether a piece of software loads that scene from AWS or GCP or wherever is outside the scope of the spec, but defining the mission gives all the pieces of information to construct a full path.

Similarly, for Sentinel 1, Sentinel 2, and CBERS you can arbitrarily reconstruct metadata/paths to an image from a Scene ID, which is separate from the current rio-tiler implementation that specifically points to AWS.

If I were to define a spec, I would say that either all paths need to be fully qualified, using any protocol, be it http(s):// or s3:// or gcp:// or azure:// or whatever, or scene ids can be used as an identifier when both it and mission exists, so that there's always a way to parse these scene ids.

You don't need to define what mission values are valid; that can be open ended and programs that implement the spec can coalesce on specific values.

cogeo-mosaic-tiler

On the cogeo-mosaic-tiler side, I'd also love to keep it as simple as possible. Thus adding missions to the tiler seems kinda the opposite

Does it need to be any more complicated than this?

from rio_tiler.main import tile as cogeoTiler
from rio_tiler.cbers import tile as cbersTiler
from rio_tiler.landsat import tile as landsatTiler
...
if mission == 'landsat':
	tiler = landsatTiler
elif mission == 'cbers':
	tiler = cbersTiler
else:
	tiler = cogeoTiler
tile, mask = mosaic_tiler(
    assets,
    x,
    y,
    z,
    tiler,
    tilesize=tile_size,
    pixel_selection=pixsel_method(),
    resampling_method=resampling_method,
	**kwargs
)

I would argue it's fine to provide default support for these individual tilers because it's an implementation detail.

Also worth noting that mission like landsat/cbers/sentinel are all hosted on different region. Creating only one endpoint for COG and missions won't be performant (and maybe costly because of inter-region data transfer).

Yes, that is a great point I overlooked, so you wouldn't want to use the same instance in one region for all missions, but you could deploy the same code to each region to very easily serve any of them.

closing this one, may revisit later ;-)