GoogleCloudDataproc/spark-bigquery-connector

Unable to write decimal values with scale > 9 to bigquery using dataproc and spark bigquery connector.

Closed this issue · 2 comments

using below configs:-

  1. config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.39.1") in spark session.
  2. .option( "decimalTargetTypes", ["BIGNUMERIC", "NUMERIC"]) while writing dataframe to BigQuery
  3. metadata = {"SPARK_BQ_CONNECTOR_VERSION": "gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.39.1.jar"}

Tested with SPARK_BQ_CONNECTOR_VERSION=0.32.2, but still same issue.

We are using dataproc workflow template with dataproc version = 2.1.0-ubuntu20

some columns have datatype as decimal(38, 10), which is causing failure while writing to BQ.

error message - "com.google.cloud.spark.bigquery.repackaged.com.google.cloud.bigquery.BigQueryException: Error while reading data, error message: The value for column 'ABSOLUTE_RANK' is out of valid NUMERIC range: Value will lose precision after scaling down to NUMERIC type"

While writing to BQ, if we provide intermediateFormat = "orc", the columns with decimal(38, 10) are being saved to BQ as STRING.

@abhilash499 Can you please provide sample code and data that you are trying to write?

@abhilash499 Please reopen the issue with the details if you are still facing this.