Representing categorical counts in raster band stats
duckontheweb opened this issue · 3 comments
Not sure if this is best discussed here or in the Classification Extension, but I'm posting here to get things started...
The Raster Band Statistics object allows us to represent information about the distribution of values in the band, including minimum, maximum, mean, and standard deviation. This work well for summarizing continuous data, but for categorical rasters (e.g. land cover) a more useful statistic might be the pixel count by value. We could theoretically represent this using a Histogram where EDIT: I misunderstood how the Histogram object is constructed; I don't think it is possible to represent categories in this way.min == max
for each bucket, but in cases where only a few of the possible values are represented in a given band (e.g. a raster only contains water and forest) this would lead to a very verbose and sparse Histogram object (most bins would have a count of 0)
Are there suggestions for how best to handle this case? Would it be appropriate to create a Histogram object with only buckets representing the values actually found in the raster?
stats object is highly inspired by the -stats
function of gdal. What does do gdal with discrete data?
gdalinfo
reports the same stats (min, max, mean, stdev, and valid percent) regardless of whether the raster represents discrete or continuous data. This makes sense because GDAL does not have any way to auto-detect whether a raster represents continuous or discrete data unless it were to assume that any rasters with an integer data type were discrete. If we want to keep the Stats Object mostly in line with GDAL -stats
functionality I can raise this issue in the Classification Extension, since that deals explicitly with categorical data.
Yes indeed, classification is a better place for that information. @duckontheweb could you please add a reference to the item in classification
extension to close the issue.