Spark BQ connector doesn't work when reading table that is partitioned?
gomrinal opened this issue · 5 comments
gomrinal commented
Note: I am testing/running spark in Dataproc so it has spark-bigquery-connector
pre-installed!
Problem: It looks like Spark BQ connector doesn't work when reading table that is partitioned?
Getting this error while reading a BQ table which is partitioned.
Py4JJavaError: An error occurred while calling o86.showString.
: com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.UnavailableException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: UNAVAILABLE: The service is currently unavailable.
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('read data from bq').getOrCreate()
table = '<project_id>.<dataset_name>.<table_name>'
data = spark.read.format('bigquery').option('table',table).load()
# Error happens when I use actions like `.show()`
data.show()
Same code works for the table that is not partitioned!
davidrabinowitz commented
Can you please share:
- Which Spark and Scala version do you use?
- Can you share the full stack trace?
gomrinal commented
Python Version : Python 3.10.8
Spark Version: v3.3.2
gomrinal commented
However, for the same table, reading operation works if I use a different way like:
from pyspark.sql import SparkSession
from google.cloud import bigquery
spark = SparkSession.builder \
.appName("BQ read example")\
.getOrCreate()
QUERY = """
SELECT *
FROM <table>
LIMIT 1000
"""
bq = bigquery.Client()
query_job = bq.query(QUERY)
query_job.result()
df = spark.read.format('bigquery') \
.option('dataset', query_job.destination.dataset_id) \
.load(query_job.destination.table_id)
df.show()