sat-utils/sat-api

Increase limits for collections endpoints.

Closed this issue · 6 comments

Increase limits for collections endpoints.

I do not recall what the actual problem was here. I think there was some hard-coded limit that was used when querying on collections, but I can't find where that is.

As I'm not sure what problem this causes, or what use case is affected by it, I'm moving it to 0.3.0 so we can get some more information first.

@matthewhanson This causes a problem when ingesting items in catalogs which contain more than 10 collections. The elasticsearch.Client.search method takes an optional argument size which defaults to 10. When removing hierarchical links from items, sat-api checks the ID of the item's respective collection (stac_item.properties.collection) against the array of collections returned from searching the collections index.

Because this array is by default limited to 10 items, sat-api may think an item is not included in a collection when it actually is. The item is still ingested successfully but any properties included at the collection level as commons are not transferred to the item (see here). The logs show the following error when this happens:

error: 1102311 has no collection

While investigating this I found a related bug - #192

nice find @geospatial-jeff , thanks for tracking it down.

So we either need to start paginating on the collections endpoint, or make the default something suitably large enough that it won't be an issue in most cases. Clearly, increasing the default isn't the best long term approach, but I'm tempted to do that for 0.3.0 (STAC 0.7.0) as there's some work to be done on paging in the upcoming version STAC version 0.8.0

@matthewhanson I think an environment variable with default value of 10 is a good short term solution since the user should know how many collections are within the catalog.

Sounds like a good plan @geospatial-jeff