Bug: Enabling predicate pushdown fails
mina-asham opened this issue · 2 comments
mina-asham commented
Hi,
I am unable to enable the predicate pushdown feature, I get this error:
Could not find an implementation of com.google.cloud.spark.bigquery.pushdowns.SparkBigQueryPushdown that supports Spark version 3.3.2
Here is how I can reproduce:
Variables:
- MY_PROJECT: project id
- MY_CLUSTER: the cluster id generate from step 1
- Create cluster
gcloud dataproc clusters create cluster-70cf \
--project $MY_PROJECT \
--image-version 2.1-debian11 \
--metadata SPARK_BQ_CONNECTOR_VERSION=0.33.0 \
--region us-central1 \
--master-machine-type n2-standard-4 --master-boot-disk-size 100 \
--num-workers 2 --worker-machine-type n2-standard-4 --worker-boot-disk-size 100
- Script to run (saved at
/tmp/run.py
)
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark.sparkContext._jvm.com.google.cloud.spark.bigquery.BigQueryConnectorUtils.enablePushdownSession(spark._jsparkSession)
- Run script
gcloud dataproc jobs submit pyspark /tmp/run.py \
--project=$MY_PROJECT \
--cluster=$MY_CLUSTER \
--region=us-central1
isha97 commented
Hi @mina-asham,
This will be fixed with #1152
I have also created a connector jar which you can use till we have a new release
https://storage.googleapis.com/davidrab-public/spark-bigquery-with-dependencies_2.12-202312211517.jar