Extent
domoritz opened this issue · 4 comments
It would be great to have an aggregation function that computes the difference between the max and the min value.
Is extent
a good name? Other alternatives are spread
or range
.
Why not simply generate max and min and then compute the difference? I'm sure an extent function could be slightly more convenient, but I'm not sure that justifies the additional surface area.
Makes sense. However, it is quite inconvenient in Vega-Lite to sort marks by the aggregate.
For example,
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {"url": "data/barley.json"},
"mark": "bar",
"encoding": {
"x": {
"aggregate": "sum",
"field": "yield",
"type": "quantitative"
},
"y": {
"field": "variety",
"type": "nominal",
"sort": {"op": "extent","field": "yield"}
}
}
}
The only way to make this work is to compute the aggregation outside
{
"$schema": "https://vega.github.io/schema/vega-lite/v2.json",
"data": {"url": "data/barley.json"},
"mark": "bar",
"transform": [
{"summarize": [{"aggregate": "min", "field": "yield", "as": "min_yield"}, {"aggregate": "max", "field": "yield", "as": "max_yield"}],
"groupby": ["variety"]},
{"calculate": "datum.max_yield - datum.min_yield", "as": "extent"}
],
"encoding": {
"x": {
"aggregate": "sum",
"field": "yield",
"type": "quantitative"
},
"y": {
"field": "variety",
"type": "nominal",
"sort": {"op": "min","field": "extent"}
}
}
}
Even if we don't add "extent", this is an interesting example.
Thanks @domoritz, now I understand the motivation for this a bit better. However, I don't think this is a scalable strategy here. We could add an extent
(or span
) operation for max - min
. Next, someone (quite reasonably) also wants the IQR span (q3 - q1
). So we could add that. And so on and so on. As a result I don't think adding Vega-level aggregates is a good strategy.
If you'd like to extend Vega-Lite to include additional aggregate ops that then compile to an aggregate + formula at the Vega level, that could be one option. A more attractive option might be to allow aggregate formulas (e.g., max - min
) in addition to aggregate functions.
Also, FWIW I think of extent
as referring to [min, max]
(a 2-tuple) and span
as referring to the magnitude of the extent (max - min
).
I like the idea of aggregate formulas but we can address that in Vega 3.1 and Vega-Lite 2.1.
The distinction between extent and span makes sense. I will adopt those terms.