stac-utils/stac-index

Add Statistics

m-mohr opened this issue · 4 comments

Make a page that returns a number of statistics that can be useful for STAC in general or just show the state of the crawling/database:

  • Number of root catalogs implementing specific stac versions
  • Number of root catalogs implementing specific (catalog/item/collection) stac extensions
  • Number of crawled collections
  • Number of pending catalogs/items/collections/all things in queue
  • Number of errors (?) / list of errored documents to hunt for bugs
  • Most common keywords (tag cloud)
  • List of licenses?
  • Min/max temporal extent? temporal "heatmap"?
  • Heatmap of data (i.e. where most data is located)
  • Number of collections per root catalog
  • Number of collections implementing summaries

@cholmes Anything specific you'd be interested in?

This is awesome. Having total number of catalogs and eventually number of items will be awesome to see - when we have a next 'data sprint' we'd aim to increase the number.

I'll think on any more stats that would be nice, but this looks like a great initial list. As discussed on the call, it'd be great to have a 'view' by stac extension. So you can go to like stacindex.org/extensions/eo and then see links to all the root catalogs that implement eo.

Great, thanks.

As discussed on the call, it'd be great to have a 'view' by stac extension. So you can go to like stacindex.org/extensions/eo and then see links to all the root catalogs that implement eo.

I'll added #10 for such a list of catalogs/collections by extension. We already have that for ecosystem tools.
At some point I may even add a overview by extension that lists both catalogs and ecosystem on one page, see #11.

Having total number of catalogs and eventually number of items will be awesome to see

Of course, that is only for public catalogs. For example, items from Radiant ML Hub or Sentinel Hub are missing as they require authentication.