BigQuery Storage API always returning 200 partitions
Closed this issue · 1 comments
rcanzanese commented
I'm using preferredMinParallelism
and maxParallelism
successfully, but no matter what I do, I always end up with 200 partitions, regardless of how big the underlying table is -- I've tried with tables as big as 4TiB with the same result.
spark:spark.datasource.bigquery.preferredMinParallelism: "33333"
spark:spark.datasource.bigquery.maxParallelism: "33333"
The message I receive with the following settings is:
Requested 33333 max partitions, but only received 200 from the BigQuery Storage API for session
Is there some additional config that I am missing?
isha97 commented
Hi @rcanzanese ,
Actual number of partitions may be less than the preferredMinParallelism
if BigQuery deems the data small enough.
There are quotas on number of partitions per read session as well which restricts the parallelism. Please file a bug with support on increasing the quota for your project.