GoogleCloudDataproc/spark-bigquery-connector

Flakey behavior when writing to BigQuery

imrimt opened this issue · 2 comments

imrimt commented

I'm trying to copy a BigQuery table to another BigQuery table but have been getting the following error, but not always (I have masked the project, dataset and table names for security purpose)

23/11/20 16:24:28 INFO com.google.cloud.spark.bigquery.direct.BigQueryRDDFactory: Created read session for table '<project>.<dataset>.<source-table>': projects/<project>/locations/us-east1/sessions/CAISDHhVa1pSU3JXaWVidhoCdngaAnZs
23/11/20 16:24:29 ERROR org.apache.spark.scheduler.TaskSetManager: task 0.0 in stage 39.0 (TID 39) had a not serializable result: com.google.cloud.spark.bigquery.repackaged.io.grpc.Status
Serialization stack:
  - object not serializable (class: com.google.cloud.spark.bigquery.repackaged.io.grpc.Status, value: Status{code=NOT_FOUND, description=Requested entity was not found. Entity: projects/<project>/datasets/<dataset>/tables/<target-table>/streams/Cig2ZTE5NTc4OC0wMDAwLTIyN2MtOTQxNS1mNGY1ZTgwZjdiYjQ6czEx, cause=null})
  - writeObject data (class: java.lang.Throwable)
  - object (class com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException, com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/<project>/datasets/<dataset>/tables/<target-table>/streams/Cig2ZTE5NTc4OC0wMDAwLTIyN2MtOTQxNS1mNGY1ZTgwZjdiYjQ6czEx)
  - writeObject data (class: java.lang.Throwable)
  - object (class com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException, com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/<project>/datasets/<dataset>/tables/<target-table>/streams/Cig2ZTE5NTc4OC0wMDAwLTIyN2MtOTQxNS1mNGY1ZTgwZjdiYjQ6czEx)
  - writeObject data (class: java.lang.Throwable)
  - object (class java.util.concurrent.ExecutionException, java.util.concurrent.ExecutionException: com.google.cloud.spark.bigquery.repackaged.com.google.api.gax.rpc.NotFoundException: com.google.cloud.spark.bigquery.repackaged.io.grpc.StatusRuntimeException: NOT_FOUND: Requested entity was not found. Entity: projects/<project>/datasets/<dataset>/tables/<target-table>/streams/Cig2ZTE5NTc4OC0wMDAwLTIyN2MtOTQxNS1mNGY1ZTgwZjdiYjQ6czEx)
  - writeObject data (class: java.lang.Throwable)
  - object (class com.google.cloud.bigquery.connector.common.BigQueryConnectorException, com.google.cloud.bigquery.connector.common.BigQueryConnectorException: Could not retrieve AppendRowsResponse)
  - field (class: com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1, name: val$e, type: class java.lang.Exception)
  - object (class com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1, com.google.cloud.spark.bigquery.write.DataSourceWriterContextPartitionHandler$1@11586cf4)
  - element of array (index: 0)
  - array (class [Ljava.lang.Object;, size 1); not retrying
23/11/20 16:24:29 WARN com.google.cloud.spark.bigquery.write.context.BigQueryDirectDataSourceWriterContext: BigQuery Data Source writer 264dc07d-49f8-4174-9596-82d9484f9202 aborted
23/11/20 16:24:29 ERROR com.tamr.flex.copy.spark.Errors: Copy failed. Exit.
com.google.cloud.bigquery.connector.common.BigQueryConnectorException: unexpected issue trying to save [string_field: string, array_field: array<string> ... 1 more field]
  at com.google.cloud.spark.bigquery.write.BigQueryDataSourceWriterInsertableRelation.insert(BigQueryDataSourceWriterInsertableRelation.java:128)
  at com.google.cloud.spark.bigquery.write.CreatableRelationProviderHelper.createRelation(CreatableRelationProviderHelper.java:54)
  at com.google.cloud.spark.bigquery.BigQueryRelationProvider.createRelation(BigQueryRelationProvider.scala:107)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
  at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:133)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:132)
  at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
  at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
  at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
  <omitted internal details>

What makes this hard to debug is that it doesn't happen all the time, but I'd say 50/50. I tried looking around online but not really sure what to search for because Requested entity was not found is too generic of an error. Would appreciate any help on this!

Versions: spark-bigquery-with-dependencies_2.12:0.32.2@jar

What is the Spark version you're using? Can you please share some sample code to reproduce this? Are there any specific connector options being used?

Please create a new issue with all the details if you are still facing it.