apache/datafusion-comet

Add support for bloom_filter_agg

Opened this issue · 5 comments

What is the problem the feature request solves?

Some TPC-H queries use bloom_filter_agg, and Comet does not have a native implementation yet.

A workaround is to set spark.sql.optimizer.runtime.bloomFilter.enabled=false.

Describe the potential solution

No response

Additional context

No response

I can take this up if more details are provided :)

I can take this up if more details are provided :)

We need to implement an equivalent of Spark's org.apache.spark.sql.catalyst.expressions.aggregate.BloomFilterAggregate.

ok, taking this up.

Accumulating some notes for this. Here's the Spark design doc on the feature:
https://docs.google.com/document/d/16IEuyLeQlubQkH8YuVuXWKo2-grVIoDJqQpHZrE7q04/

It looks like we already have the filter support thanks to #179.