amplab/succinct

Problems with Spark 1.4

mrt opened this issue · 3 comments

mrt commented

I've built Succinct with Spark 1.4, expecting smooth migration. But the SQL module generates several errors. I've resolved a few of them but still haven't figure out how to fix the serialVersionUID mismatches like blow.

[info] - dsl test *** FAILED ***
[info] java.io.InvalidClassException: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = 8479641856817081483, local class serialVersionUID = -7860166653361823912
[info] at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
[info] at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
[info] at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
[info] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
[info] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
[info] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
[info] at edu.berkeley.cs.succinct.sql.SuccinctUtils$.readObjectFromFS(SuccinctUtils.scala:40)
[info] at edu.berkeley.cs.succinct.sql.SuccinctRelation.getSchema(SuccinctRelation.scala:29)
[info] at edu.berkeley.cs.succinct.sql.SuccinctRelation.(SuccinctRelation.scala:14)
[info] at edu.berkeley.cs.succinct.sql.package$SuccinctContext.succinctFile(package.scala:12)
[info] ...

[info] - sql test *** FAILED ***
[info] java.io.InvalidClassException: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = 8479641856817081483, local class serialVersionUID = -7860166653361823912
[info] at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
[info] at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
[info] at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
[info] at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
[info] at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
[info] at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
[info] at edu.berkeley.cs.succinct.sql.SuccinctUtils$.readObjectFromFS(SuccinctUtils.scala:40)
[info] at edu.berkeley.cs.succinct.sql.SuccinctRelation.getSchema(SuccinctRelation.scala:29)
[info] at edu.berkeley.cs.succinct.sql.SuccinctRelation.(SuccinctRelation.scala:14)
[info] at edu.berkeley.cs.succinct.sql.DefaultSource.createRelation(DefaultSource.scala:18)
[info] ...

It might be a bad idea to serialize Spark SQL classes like StructType; the exception is most likely due to incompatible serialVersionUID for the StructType classes in SuccinctTableRDD. The simplest solution is not to serialize StructType directly, but to serialize their component fields.

Succinct now builds cleanly with Spark 1.4.

I am facing this error as well.

I am trying to submit a sparkR job to my cloudera cluster from separate spark standalone. Can you please help.

have a spark standalone system and I am using this to submit a sparkR job on existing cloudera CDH cluster

Apache Spark Version
1.5.0, Hadoop 2.6

Cloudera Spark Version
1.5.0-cdh5.5.1, Hadoop 2.6.0-cdh5.5.1

Code:

`library(SparkR, lib.loc = "/opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6/R/lib")

sc <- sparkR.init(master = "spark://10.103.25.39:7077", appName = "SparkR_demo_RTA", sparkHome = "/opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6", sparkEnvir = list(spark.executor.memory = '512m'))

sqlContext <- sparkRSQL.init(sc)
df <- createDataFrame(sqlContext, faithful)
head(df)

sparkR.stop()`

Next I am submitting this sparkR-testcluster.R file as follows

export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/etc/hadoop/conf.pseudo export SPARK_DIST_CLASSPATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/etc/hadoop/conf.pseudo ./bin/spark-submit --master spark://10.103.25.39:7077 /opt/BIG-DATA/SparkR/sparkR-testcluster.R
However, I am getting following error (which if I understand it correctly is because of version mismatch)
`
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20161214124958-0014/151 on hostPort 10.103.40.186:7078 with 4 cores, 512.0 MB RAM
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now RUNNING
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now LOADING
16/12/14 12:51:26 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, 10.103.40.207): java.io.InvalidClassException: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = -2623502157469710728, local class serialVersionUID = 1299744747852393705
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

16/12/14 12:51:26 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, 10.103.25.39, PROCESS_LOCAL, 13045 bytes)
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/151 is now EXITED (Command exited with code 1)
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161214124958-0014/151 removed: Command exited with code 1
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 151
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20161214124958-0014/152 on worker-20161208195437-10.103.40.186-7078 (10.103.40.186:7078) with 4 cores

....................
....................

16/12/14 12:51:26 ERROR r.RBackendHandler: dfToCols on org.apache.spark.sql.api.r.SQLUtils failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, 10.103.40.207): java.io.InvalidClassException: org.apache.spark.sql.types.StructType; local class incompatible: stream classdesc serialVersionUID = -2623502157469710728, local class serialVersionUID = 1299744747852393705
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.
Calls: head ... collect -> collect -> .local -> callJStatic -> invokeJava
Execution halted
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/154 is now LOADING
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor updated: app-20161214124958-0014/154 is now EXITED (Command exited with code 1)
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Executor app-20161214124958-0014/154 removed: Command exited with code 1
16/12/14 12:51:26 INFO cluster.SparkDeploySchedulerBackend: Asked to remove non-existent executor 154
16/12/14 12:51:26 INFO client.AppClient$ClientEndpoint: Executor added: app-20161214124958-0014/155 on worker-20161208195437-10.103.40.186-7078 (10.103.40.186:7078) with 4 cores
`
I am failing to understand where am I going wrong

Any help?