ibis-project/ibis-substrait

feat: Add support for range-based OVER window semantics

Opened this issue · 3 comments

What happened?

Ibis supports both row-based and range-based OVER window semantics, showing below:

import ibis
from ibis import _

bid_schema = ibis.schema(
    {
        "auction": "int64",
        "bidder": "int64",
        "price": "float64",
        "datetime": "timestamp(3)"
    }
)
bid_table = ibis.table(name="Bid", schema=bid_schema)

# range-based OVER window
range_window = ibis.window(group_by=_.auction, range=(-ibis.interval(seconds=10), 0))
bid_table.filter(_ is not None)[_.price.mean().over(range_window, order_by=_.datetime).name("avg_price")]

# row-based OVER window
row_window =ibis.window(group_by=_.auction, preceding=5, following=5, order_by=_.datetime)
bid_table.filter(_ is not None)[_.price.mean().over(row_window, order_by=_.datetime).name("avg_price")]

It looks like ibis-substrate errors out generating the corresponding substrait plan for the range window.

What version of ibis-substrait are you using?

4.0.1

What substrait consumer(s) are you using, if any?

N/A

Relevant log output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 1
----> 1 plan = compiler.compile(window_avg_price)
      2 with open("window_avg_price.proto", "wb") as f:
      3     f.write(plan.SerializeToString())

File /opt/conda/envs/composable-data-arch/lib/python3.12/site-packages/ibis_substrait/compiler/core.py:222, in SubstraitCompiler.compile(self, expr, **kwargs)
    217 from .translate import translate
    219 expr_schema = expr.schema()
    220 rel = stp.PlanRel(
    221     root=stalg.RelRoot(
--> 222         input=translate(expr.op(), compiler=self, **kwargs),
    223         names=translate(expr_schema).names,
    224     )
    225 )
    226 ver = vparse(__substrait_version__)
    227 return stp.Plan(
    228     version=stp.Version(
    229         major_number=ver.major,
   (...)
    256     relations=[rel],
    257 )

File /opt/conda/envs/composable-data-arch/lib/python3.12/functools.py:909, in singledispatch.<locals>.wrapper(*args, **kw)
...
    442     return translate_preceding(boundary.value.value)  # type: ignore
    443 else:
--> 444     return translate_following(boundary.value.value)

AttributeError: 'Cast' object has no attribute 'value'

This turned up a few issues in how we were translating window functions that I can address.

But there's one thing that I cannot address -- as far as I can tell, there is no way to specify an interval for a window boundary in Substrait. I could be wrong about this, but reading over the spec, it appears to only consider integer number of rows:

      message Preceding {
        // A strictly positive integer specifying the number of records that
        // the window extends back from the current record. Required. Use
        // CurrentRow for offset zero and Following for negative offsets.
        int64 offset = 1;
      }

      // Defines that the bound extends this far ahead of the current record.
      message Following {
        // A strictly positive integer specifying the number of records that
        // the window extends ahead of the current record. Required. Use
        // CurrentRow for offset zero and Preceding for negative offsets.
        int64 offset = 1;
      }

@EpsilonPrime, do you have thoughts on @gforsyth's question regarding supporting interval for a window boundary in Substrait?

Would specifying the BoundsType as BOUNDS_TYPE_RANGE work here?