Crashing for larger data set
manjush3v opened this issue · 2 comments
manjush3v commented
I am running spark context with specification -
from pyspark import SparkConf, SparkContext
conf = (SparkConf()
.setMaster("spark-master-url")
.setAppName("PySparkShell")
.set("spark.executor.memory", "6800M"))
sc = SparkContext(conf = conf)
The program is working fine when X_train length is 5000 but fails when the size is increased to 12000.
spark keeps crashing with following errors -
Lost task 13.0 in stage 1.0 (TID 109, 172.31.8.203, executor 1): org.apache.spark.SparkException: Python worker exited unexpectedly (crashed)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:230)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
... 11 more
More details here
ChelyYi commented
https://issues.apache.org/jira/browse/SPARK-12261
Maybe this will help you
srowen commented
This is really more of a Spark issue. I don't see spark-sklearn here.