EMCECS/spark-ecs-connector

Date fields cause java cast error

Closed this issue · 4 comments

If any of the indexed metadata fields are set up as datetime, when trying to retrieve the fields via Zeppelin a cast error is generated:

java.lang.ClassCastException: org.joda.time.Instant cannot be cast to java.sql.Timestamp

Dataframe:
df: org.apache.spark.sql.DataFrame = [domain-name: string, site: string, meas-start-datetime: timestamp, meas-end-datetime: timestamp, meas-strm-name: string, meas-strm-id: string, ObjectName: string, LastModified: timestamp, Owner: string, Size: int, CreateTime: timestamp, LastModified: timestamp, Owner: string, ContentType: string, Etag: string, Size: int, CreateTime: timestamp, Expiration: timestamp, ContentEncoding: string, Expires: timestamp, Retention: int, Namespace: string, ObjectName: string, Key: string]

Query:

%sql
SELECT* FROM ECS WHERE `domain-name` = "D03" LIMIT 10

Error:

java.lang.ClassCastException: org.joda.time.Instant cannot be cast to java.sql.Timestamp
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$TimestampConverter$.toCatalystImpl(CatalystTypeConverters.scala:313)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:97)
	at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
	at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:59)
	at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$2.apply(ExistingRDD.scala:56)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
	at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
	at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
	at scala.collection.AbstractIterator.to(Iterator.scala:1157)
	at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
	at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

@brokenjacobs what version of Spark are you using? Thanks

The version that ships in the docker file. Although I've been informed you're already working the issue, so hopefully thats not too late.

spark-ecs-s3_2.10:1.0-SNAPSHOT is the version per the zeppelin notebook

@brokenjacobs please see the PR #9 which has a fix for this issue. I assume that you're using Spark 2.x with Scala 2.11. Feel free to discuss the PR on that thread.