yaooqinn/spark-ranger

spark pushdown and partitionFilter not working

Closed this issue · 9 comments

Row filter and Column masking is helpful.
But with it enabled then spark cant do push down and partition filter ( physic plan ).
Do you have any solution @yaooqinn ?

I think Spark dont know how to optimize RangerSparkRowFilter and RangerSparkMasking, so we can make original plan optimized before inject two custom optimized (row filter and masking).

ah, thanks for reporting this, I will investigate this soon

I fixed it by injecting a optimized extension after row filter and column masking happend.
Then I used a transform to remove all class marker: RangerSparkRowFilter, RangerSparkMasking and after that Spark (physical planner) will do the rest.

I can do a pull request if you want @yaooqinn .

pg20 commented

Hey, we are facing the same issue . The plan look like this for below query:
val statement = "select * from table-name where year='2019' and month='05' and day = '23' and hour = '02' limit 10"

Project [code#1058,  ... 93 more fields]
+- GlobalLimit 10
     +- LocalLimit 10
      +- Filter (((((((isnotnull(year#1172) && isnotnull(hour#1169)) && isnotnull(month#1173)) && isnotnull(day#1174)) && (year#1172 = 2019)) && (month#1173 = 05)) && (day#1174 = 23)) && (hour#1169 = 02))
       +- RangerSparkRowFilter
            +- RangerSparkMasking
               +- Relation[code#1058,,... 93 more fields] parquet

@brucemen711 Can you share the pull request for this to have a look ?

@brucemen711
Can you please share the pull request. We have same issue with the partition pruning.

Hi guys, this project has become a sub-module of Apache submarine, so please go to https://github.com/apache/submarine for further verification and discussion. This problem might be fixed there. Sorry for the inconvenience. thanks.

@yaooqinn
I took the fix from submarine and applied to our fork. All tests with partition pruning seems to be working fine. Thank you

Sorry for late response, glad to see @yaooqinn resolved it.
May be we need to archived this project.

thanks guys, the apache submarine project is much more official place to maintain this, and there will be maven central artifact for this with the next release of submarine.

I will keep this project unarchived just for a while