stac-extensions/classification

Representing categorical counts in raster band stats

duckontheweb opened this issue · 3 comments

Not sure if this is best discussed here or in the Classification Extension, but I'm posting here to get things started...

The Raster Band Statistics object allows us to represent information about the distribution of values in the band, including minimum, maximum, mean, and standard deviation. This work well for summarizing continuous data, but for categorical rasters (e.g. land cover) a more useful statistic might be the pixel count by value. We could theoretically represent this using a Histogram where min == max for each bucket, but in cases where only a few of the possible values are represented in a given band (e.g. a raster only contains water and forest) this would lead to a very verbose and sparse Histogram object (most bins would have a count of 0) EDIT: I misunderstood how the Histogram object is constructed; I don't think it is possible to represent categories in this way.

Are there suggestions for how best to handle this case? Would it be appropriate to create a Histogram object with only buckets representing the values actually found in the raster?

stats object is highly inspired by the -stats function of gdal. What does do gdal with discrete data?

gdalinfo reports the same stats (min, max, mean, stdev, and valid percent) regardless of whether the raster represents discrete or continuous data. This makes sense because GDAL does not have any way to auto-detect whether a raster represents continuous or discrete data unless it were to assume that any rasters with an integer data type were discrete. If we want to keep the Stats Object mostly in line with GDAL -stats functionality I can raise this issue in the Classification Extension, since that deals explicitly with categorical data.

Yes indeed, classification is a better place for that information. @duckontheweb could you please add a reference to the item in classification extension to close the issue.