GoogleCloudDataproc/spark-bigquery-connector

Bug: Enabling predicate pushdown fails

mina-asham opened this issue · 2 comments

Hi,

I am unable to enable the predicate pushdown feature, I get this error:

Could not find an implementation of com.google.cloud.spark.bigquery.pushdowns.SparkBigQueryPushdown that supports Spark version 3.3.2

Here is how I can reproduce:

Variables:

  • MY_PROJECT: project id
  • MY_CLUSTER: the cluster id generate from step 1
  1. Create cluster
gcloud dataproc clusters create cluster-70cf \
--project $MY_PROJECT \
--image-version 2.1-debian11 \
--metadata SPARK_BQ_CONNECTOR_VERSION=0.33.0 \
--region us-central1 \
--master-machine-type n2-standard-4 --master-boot-disk-size 100 \
--num-workers 2 --worker-machine-type n2-standard-4 --worker-boot-disk-size 100
  1. Script to run (saved at /tmp/run.py)
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
spark.sparkContext._jvm.com.google.cloud.spark.bigquery.BigQueryConnectorUtils.enablePushdownSession(spark._jsparkSession)
  1. Run script
gcloud dataproc jobs submit pyspark /tmp/run.py \
--project=$MY_PROJECT \
--cluster=$MY_CLUSTER \
--region=us-central1

Hi @mina-asham,

This will be fixed with #1152
I have also created a connector jar which you can use till we have a new release
https://storage.googleapis.com/davidrab-public/spark-bigquery-with-dependencies_2.12-202312211517.jar