class cast exception has occurs (Double cannot be cast to Float)
Closed this issue · 3 comments
Hi, I'm trying to analyze firebase data using Spark with this spark-Bigquery. But class cast exception has occurred like Double cannot be cast to Float.
Additionally, Double type exists in the Avro specs, but it seems only Float type casting in the module. (https://avro.apache.org/docs/1.8.1/spec.html)
- code
Would you mind tell me is this a bug?
https://support.google.com/firebase/answer/7029846
Error Detail
- command
val df = spark.sqlContext.read.format("com.samelamin.spark.bigquery")
.option("tableReferenceSource","xxxx:yyy.app_events_intraday_20180417")
.load()
df.printSchema
- output
root
|-- user_dim: struct (nullable = true)
| |-- user_id: string (nullable = true)
| |-- first_open_timestamp_micros: long (nullable = true)
| |-- user_properties: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- key: string (nullable = true)
| | | |-- value: struct (nullable = true)
| | | | |-- value: struct (nullable = true)
| | | | | |-- string_value: string (nullable = true)
| | | | | |-- int_value: long (nullable = true)
| | | | | |-- float_value: float (nullable = true)
| | | | | |-- double_value: float (nullable = true)
| | | | |-- set_timestamp_usec: long (nullable = true)
| | | | |-- index: long (nullable = true)
| |-- device_info: struct (nullable = true)
| | |-- device_category: string (nullable = true)
| | |-- mobile_brand_name: string (nullable = true)
| | |-- mobile_model_name: string (nullable = true)
| | |-- mobile_marketing_name: string (nullable = true)
| | |-- device_model: string (nullable = true)
| | |-- platform_version: string (nullable = true)
| | |-- device_id: string (nullable = true)
| | |-- resettable_device_id: string (nullable = true)
| | |-- user_default_language: string (nullable = true)
| | |-- device_time_zone_offset_seconds: long (nullable = true)
| | |-- limited_ad_tracking: boolean (nullable = true)
| |-- geo_info: struct (nullable = true)
| | |-- continent: string (nullable = true)
| | |-- country: string (nullable = true)
| | |-- region: string (nullable = true)
| | |-- city: string (nullable = true)
| |-- app_info: struct (nullable = true)
| | |-- app_version: string (nullable = true)
| | |-- app_instance_id: string (nullable = true)
| | |-- app_store: string (nullable = true)
| | |-- app_platform: string (nullable = true)
| | |-- app_id: string (nullable = true)
| |-- traffic_source: struct (nullable = true)
| | |-- user_acquired_campaign: string (nullable = true)
| | |-- user_acquired_source: string (nullable = true)
| | |-- user_acquired_medium: string (nullable = true)
| |-- bundle_info: struct (nullable = true)
| | |-- bundle_sequence_id: long (nullable = true)
| | |-- server_timestamp_offset_micros: long (nullable = true)
| |-- ltv_info: struct (nullable = true)
| | |-- revenue: float (nullable = true)
| | |-- currency: string (nullable = true)
|-- event_dim: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- date: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- params: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- key: string (nullable = true)
| | | | |-- value: struct (nullable = true)
| | | | | |-- string_value: string (nullable = true)
| | | | | |-- int_value: long (nullable = true)
| | | | | |-- float_value: float (nullable = true)
| | | | | |-- double_value: float (nullable = true)
| | |-- timestamp_micros: long (nullable = true)
| | |-- previous_timestamp_micros: long (nullable = true)
| | |-- value_in_usd: float (nullable = true)
- command
import org.apache.spark.sql.functions._
df.show
- output
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 9, 10.228.249.82, executor 0): java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float
Hi @smdmts correct, we should be casting float to float not double to float
Good find!
Feel free to send a pr in
Hi,
Using the below in Scala code
import com.samelamin.spark.bigquery._
I have a Hive table imported to BigQuery through avro file and table is created in BQ as follows
It is pretty simple. The code tries to load this table first
`//read data from BigQuery Table
println("\nreading data from " + fullyQualifiedInputTableId)
val df = spark.sqlContext
.read
.format("com.samelamin.spark.bigquery")
.option("tableReferenceSource",fullyQualifiedInputTableId)
.load()
df.printSchema
// create a temporary view on DF
df.createOrReplaceTempView ("tmp")
`
OK this is the output
reading data from axial-glow-224522:accounts.ll_18201960 root |-- transactiondate: string (nullable = true) |-- transactiontype: string (nullable = true) |-- sortcode: string (nullable = true) |-- accountnumber: string (nullable = true) |-- transactiondescription: string (nullable = true) |-- debitamount: float (nullable = true) |-- creditamount: float (nullable = true) |-- balance: float (nullable = true)
The tmp view is created. However, when trying to read debitamount defined as float, I am getting the following error
spark.sql("select transactiondate,transactiontype, sortcode, accountnumber, transactiondescription, debitamount from tmp").collect.foreach(println)
18/12/27 19:41:59 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, rhes77-cluster-w-1.europe-west2-a.c.axial-glow-224522.internal, executor 1): java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float at scala.runtime.BoxesRunTime.unboxToFloat(BoxesRunTime.java:109) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getFloat(rows.scala:43) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getFloat(rows.scala:195) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Any workaround on this if exists please!
Thanks,
Mich
Hi,
I now have a work-around for this issue using Spark DF transformation to cast date from String to Date and String to Double where appropriate and then save the data in BigQuery table.
Let me know your thoughts.
Thanks