Year Extract not Supported
joshua-frat opened this issue · 10 comments
Hi, I am executing the substrait plans created by tpc-h09, and I noticed the plan does not execute due to the extract call made. I'm not sure, but it looks like it is incomplete. Either that, or it could be replaced by a year call in the plan. Removing the year call in the Python allows the plan to execute to completion. Just wanted to let you know for future changes to the compiler, if this is the case. Thanks.
Thanks for reporting @joshua-frat !
Which substrait consumer are you executing against? And can you paste a copy of your tpc-h09 code?
I am executing against Facebook Velox, and my tpc-h09 code is as follows:
`def tpc_h09(part, supplier, lineitem, partsupp, orders, nation):
Doesn't work in Velox, extracting year not supported in compiler
q = lineitem
q = q.join(supplier, lineitem.l_suppkey == supplier.s_suppkey)
q = q.join(
partsupp,
(lineitem.l_suppkey == partsupp.ps_suppkey)
& (lineitem.l_partkey == partsupp.ps_partkey),
)
q = q.join(part, lineitem.l_partkey == part.p_partkey)
q = q.join(orders, lineitem.l_orderkey == orders.o_orderkey)
q = q.join(nation, supplier.s_nationkey == nation.n_nationkey)
one = ibis.literal(1, 'int32')
q = q.select(
q.p_name,
amount=lambda t: (t.l_extendedprice * (one - t.l_discount) - t.ps_supplycost * t.l_quantity),
o_year=lambda t: t.o_orderdate.year().cast("string"),
nation=lambda t: t.n_name,
)
q = q.filter([q.p_name.like("%GREEN%")])
gq = q.group_by([q.nation, q.o_year])
q = gq.aggregate(sum_profit=q.amount.sum())
q = q.sort_by([q.nation, ibis.desc(q.o_year)])
return q`
You may notice a few other things about this code. For one, Velox doesn't like casting constant literals, so operations involving the literal 1 had to be made as Int32 beforehand. Also, Aggregations that are more complex than sums, means, or counts do not work unless the intermediary steps are set as select statements for the inputs to the aggregation. Another noteworthy thing is that join key predicates must be present in their respective left or right side tables based on which side they're on in the predicate. For example, q.join(orders, lineitem.l_orderkey == orders.o_orderkey)
will work but q.join(orders, orders.o_orderkey == lineitem.l_orderkey)
will not. One other thing is that the isin directive does not execute in Velox. That's about all I have to say about how the compiler runs, sorry if it's a lot for this issue, but that's what I've found when trying to run it.
Thanks @joshua-frat -- that's helpful! Could you also paste in the python string representation of the substrait protobuf message? Not the byte-string but the representation that's almost JSON?
Sure, here it is:
{
"extensionUris": [
{
"extensionUriAnchor": 1
}
],
"extensions": [
{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 1,
"name": "equal"
}
},
{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 2,
"name": "and"
}
},
{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 3,
"name": "like"
}
},
{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 4,
"name": "subtract"
}
},
{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 5,
"name": "multiply"
}
},
{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 6,
"name": "extract"
}
},
{
"extensionFunction": {
"extensionUriReference": 1,
"functionAnchor": 7,
"name": "sum"
}
}
],
"relations": [
{
"root": {
"input": {
"sort": {
"input": {
"aggregate": {
"input": {
"project": {
"common": {
"emit": {
"outputMapping": [
50,
51,
52,
53
]
}
},
"input": {
"filter": {
"input": {
"join": {
"left": {
"join": {
"left": {
"join": {
"left": {
"join": {
"left": {
"join": {
"left": {
"read": {
"common": {
"direct": {
}
},
"baseSchema": {
"names": [
"l_orderkey",
"l_partkey",
"l_suppkey",
"l_linenumber",
"l_quantity",
"l_extendedprice",
"l_discount",
"l_tax",
"l_returnflag",
"l_linestatus",
"l_shipdate",
"l_commitdate",
"l_receiptdate",
"l_shipinstruct",
"l_shipmode",
"l_comment"
],
"struct": {
"types": [
{
"i64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"i64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"i64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"i32": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"date": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"date": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"date": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
},
{
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
}
],
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": [
"lineitem"
]
}
}
},
"right": {
"read": {
"common": {
"direct": {
}
},
"baseSchema": {
"names": [
"s_suppkey",
"s_name",
"s_address",
"s_nationkey",
"s_phone",
"s_acctbal",
"s_comment"
],
"struct": {
"types": [
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"fp64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
}
],
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": [
"supplier"
]
}
}
},
"expression": {
"scalarFunction": {
"functionReference": 1,
"outputType": {
"bool": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 2
}
},
"rootReference": {
}
}
}
},
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 16
}
},
"rootReference": {
}
}
}
}
]
}
},
"type": "JOIN_TYPE_INNER"
}
},
"right": {
"read": {
"common": {
"direct": {
}
},
"baseSchema": {
"names": [
"ps_partkey",
"ps_suppkey",
"ps_availqty",
"ps_supplycost",
"ps_comment"
],
"struct": {
"types": [
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"i32": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"fp64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
}
],
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": [
"partsupp"
]
}
}
},
"expression": {
"scalarFunction": {
"functionReference": 2,
"outputType": {
"bool": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"scalarFunction": {
"functionReference": 1,
"outputType": {
"bool": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 2
}
},
"rootReference": {
}
}
}
},
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 24
}
},
"rootReference": {
}
}
}
}
]
}
}
},
{
"value": {
"scalarFunction": {
"functionReference": 1,
"outputType": {
"bool": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 1
}
},
"rootReference": {
}
}
}
},
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 23
}
},
"rootReference": {
}
}
}
}
]
}
}
}
]
}
},
"type": "JOIN_TYPE_INNER"
}
},
"right": {
"read": {
"common": {
"direct": {
}
},
"baseSchema": {
"names": [
"p_partkey",
"p_name",
"p_mfgr",
"p_brand",
"p_type",
"p_size",
"p_container",
"p_retailprice",
"p_comment"
],
"struct": {
"types": [
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"i32": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"fp32": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
}
],
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": [
"part"
]
}
}
},
"expression": {
"scalarFunction": {
"functionReference": 1,
"outputType": {
"bool": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 1
}
},
"rootReference": {
}
}
}
},
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 28
}
},
"rootReference": {
}
}
}
}
]
}
},
"type": "JOIN_TYPE_INNER"
}
},
"right": {
"read": {
"common": {
"direct": {
}
},
"baseSchema": {
"names": [
"o_orderkey",
"o_custkey",
"o_orderstatus",
"o_totalprice",
"o_orderdate",
"o_orderpriority",
"o_clerk",
"o_shippriority",
"o_comment"
],
"struct": {
"types": [
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"fp64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"date": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
}
],
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": [
"orders"
]
}
}
},
"expression": {
"scalarFunction": {
"functionReference": 1,
"outputType": {
"bool": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
}
},
"rootReference": {
}
}
}
},
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 37
}
},
"rootReference": {
}
}
}
}
]
}
},
"type": "JOIN_TYPE_INNER"
}
},
"right": {
"read": {
"common": {
"direct": {
}
},
"baseSchema": {
"names": [
"n_nationkey",
"n_name",
"n_regionkey",
"n_comment"
],
"struct": {
"types": [
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"i64": {
"nullability": "NULLABILITY_REQUIRED"
}
},
{
"string": {
"nullability": "NULLABILITY_REQUIRED"
}
}
],
"nullability": "NULLABILITY_REQUIRED"
}
},
"namedTable": {
"names": [
"nation"
]
}
}
},
"expression": {
"scalarFunction": {
"functionReference": 1,
"outputType": {
"bool": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 19
}
},
"rootReference": {
}
}
}
},
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 46
}
},
"rootReference": {
}
}
}
}
]
}
},
"type": "JOIN_TYPE_INNER"
}
},
"condition": {
"scalarFunction": {
"functionReference": 3,
"outputType": {
"bool": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 29
}
},
"rootReference": {
}
}
}
},
{
"value": {
"literal": {
"string": "%GREEN%"
}
}
}
]
}
}
}
},
"expressions": [
{
"selection": {
"directReference": {
"structField": {
"field": 29
}
},
"rootReference": {
}
}
},
{
"scalarFunction": {
"functionReference": 4,
"outputType": {
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"scalarFunction": {
"functionReference": 5,
"outputType": {
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 5
}
},
"rootReference": {
}
}
}
},
{
"value": {
"scalarFunction": {
"functionReference": 4,
"outputType": {
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"literal": {
"i32": 1
}
}
},
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 6
}
},
"rootReference": {
}
}
}
}
]
}
}
}
]
}
}
},
{
"value": {
"scalarFunction": {
"functionReference": 5,
"outputType": {
"fp64": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 26
}
},
"rootReference": {
}
}
}
},
{
"value": {
"selection": {
"directReference": {
"structField": {
"field": 4
}
},
"rootReference": {
}
}
}
}
]
}
}
}
]
}
},
{
"cast": {
"type": {
"string": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"input": {
"scalarFunction": {
"functionReference": 6,
"outputType": {
"i32": {
"nullability": "NULLABILITY_NULLABLE"
}
},
"arguments": [
{
"enum": {
}
}
]
}
}
}
}
]
}
}
}
}
}
}
}
}
]
}
Thanks @joshua-frat -- can you confirm which version of ibis-substrait
you're using? As of the latest release, we should be including the YEAR
argument in the extract
call
I'm using V2.18.0
Huh -- and which version of Ibis are you using?
I would expect to see the following towards the end of your substrait plan:
scalar_function {
function_reference: 6
output_type {
i32 {
nullability: NULLABILITY_NULLABLE
}
}
arguments {
enum {
specified: "YEAR"
}
}
I'm using ibis 3.2.0
Hm, well, running your tpc_h09 expression with ibis-substrait 2.18.0 and ibis 3.2.0 on my end adds in the enum: YEAR
bit, so I'm not sure what's going on.
It might be worth trying to reinstall both libraries (ibis and ibis-substrait) and see if that removes the hiccup.
Thanks, I tried reinstalling ibis and it didn't work. Unfortunately I didn't install ibis-substrait as outlined in CONTRIBUTING.md and instead cloned the repository and used a modified version of the test files to output the protobuf binaries using pytest. Maybe the different environment has something to do with it or one of the files got modified, but I can try installing it and executing the test function differently and see if that has any effect. Will let you know on the results later if I do.