too many tasks when I read a distributed partitioned table using partition key filter
ScalaFirst opened this issue · 1 comments
ScalaFirst commented
dependency:
com.github.housepower
clickhouse-spark-runtime-3.2_2.12
0.5.0
my sql is :
select * from xxx where dt (this is my partition key) = '2023-03-09' ,
the database has not value , so I think this sql will complete quickly but not.
I found the filter message is :
Pushing operators to label_platform
.ch_label_crowd_export
Pushed Filters: EqualTo(dt,2023-03-09)
Post-Scan Filters:
and total tasks count is : 1124
I think best performance tasks total should be 1 task because I have no data in dt = '2023-03-09'
pan3793 commented
that's a good point, we can collect more metrics during the planning phase, and eliminate task assignments for those partitions which do not contain any data.