Date fields cause java cast error
Closed this issue · 4 comments
If any of the indexed metadata fields are set up as datetime, when trying to retrieve the fields via Zeppelin a cast error is generated:
java.lang.ClassCastException: org.joda.time.Instant cannot be cast to java.sql.Timestamp
Dataframe:
df: org.apache.spark.sql.DataFrame = [domain-name: string, site: string, meas-start-datetime: timestamp, meas-end-datetime: timestamp, meas-strm-name: string, meas-strm-id: string, ObjectName: string, LastModified: timestamp, Owner: string, Size: int, CreateTime: timestamp, LastModified: timestamp, Owner: string, ContentType: string, Etag: string, Size: int, CreateTime: timestamp, Expiration: timestamp, ContentEncoding: string, Expires: timestamp, Retention: int, Namespace: string, ObjectName: string, Key: string]
Query:
%sql
SELECT* FROM ECS WHERE `domain-name` = "D03" LIMIT 10
Error:
java.lang.ClassCastException: org.joda.time.Instant cannot be cast to java.sql.Timestamp
at org.apache.spark.sql.catalyst.CatalystTypeConverters$TimestampConverter$.toCatalystImpl(CatalystTypeConverters.scala:313)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:97)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
@brokenjacobs what version of Spark are you using? Thanks
The version that ships in the docker file. Although I've been informed you're already working the issue, so hopefully thats not too late.
spark-ecs-s3_2.10:1.0-SNAPSHOT is the version per the zeppelin notebook
@brokenjacobs please see the PR #9 which has a fix for this issue. I assume that you're using Spark 2.x with Scala 2.11. Feel free to discuss the PR on that thread.