feat: Add support for range-based OVER window semantics
Opened this issue · 3 comments
What happened?
Ibis supports both row-based and range-based OVER window semantics, showing below:
import ibis
from ibis import _
bid_schema = ibis.schema(
{
"auction": "int64",
"bidder": "int64",
"price": "float64",
"datetime": "timestamp(3)"
}
)
bid_table = ibis.table(name="Bid", schema=bid_schema)
# range-based OVER window
range_window = ibis.window(group_by=_.auction, range=(-ibis.interval(seconds=10), 0))
bid_table.filter(_ is not None)[_.price.mean().over(range_window, order_by=_.datetime).name("avg_price")]
# row-based OVER window
row_window =ibis.window(group_by=_.auction, preceding=5, following=5, order_by=_.datetime)
bid_table.filter(_ is not None)[_.price.mean().over(row_window, order_by=_.datetime).name("avg_price")]
It looks like ibis-substrate errors out generating the corresponding substrait plan for the range window.
What version of ibis-substrait are you using?
4.0.1
What substrait consumer(s) are you using, if any?
N/A
Relevant log output
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[4], line 1
----> 1 plan = compiler.compile(window_avg_price)
2 with open("window_avg_price.proto", "wb") as f:
3 f.write(plan.SerializeToString())
File /opt/conda/envs/composable-data-arch/lib/python3.12/site-packages/ibis_substrait/compiler/core.py:222, in SubstraitCompiler.compile(self, expr, **kwargs)
217 from .translate import translate
219 expr_schema = expr.schema()
220 rel = stp.PlanRel(
221 root=stalg.RelRoot(
--> 222 input=translate(expr.op(), compiler=self, **kwargs),
223 names=translate(expr_schema).names,
224 )
225 )
226 ver = vparse(__substrait_version__)
227 return stp.Plan(
228 version=stp.Version(
229 major_number=ver.major,
(...)
256 relations=[rel],
257 )
File /opt/conda/envs/composable-data-arch/lib/python3.12/functools.py:909, in singledispatch.<locals>.wrapper(*args, **kw)
...
442 return translate_preceding(boundary.value.value) # type: ignore
443 else:
--> 444 return translate_following(boundary.value.value)
AttributeError: 'Cast' object has no attribute 'value'
This turned up a few issues in how we were translating window functions that I can address.
But there's one thing that I cannot address -- as far as I can tell, there is no way to specify an interval for a window boundary in Substrait. I could be wrong about this, but reading over the spec, it appears to only consider integer number of rows:
message Preceding {
// A strictly positive integer specifying the number of records that
// the window extends back from the current record. Required. Use
// CurrentRow for offset zero and Following for negative offsets.
int64 offset = 1;
}
// Defines that the bound extends this far ahead of the current record.
message Following {
// A strictly positive integer specifying the number of records that
// the window extends ahead of the current record. Required. Use
// CurrentRow for offset zero and Preceding for negative offsets.
int64 offset = 1;
}
@EpsilonPrime, do you have thoughts on @gforsyth's question regarding supporting interval for a window boundary in Substrait?
Would specifying the BoundsType as BOUNDS_TYPE_RANGE work here?