stackoverflow
bkersbergen opened this issue · 2 comments
bkersbergen commented
Hi,
When 'predicting' a single Vector from a RDD[Vector] on a trained model a stackoverflowerror is thrown.
When doing the same on a RDD[Vector] at once it works oke.
println("clustering single vectors fails")
val singleVector = mymatrix.map { point =>
try {
val prediction = kModel.predict(point)
(point.toString, prediction)
} catch {
case e: Error => println("unable to predict a single vector")
}
}
println(s"singleVector.count():${singleVector.count()}")
println("clustering using multiple vectors, this runs oke")
val predictions = kModel.predict(mymatrix)
val multipleVector = predictions.zip(mymatrix).map(point => (point._2.toString, point._1))
println(s"multipleVector.count():${multipleVector.count()}")
I've put my code with data as an example here: https://github.com/bkersbergen/massive-kmeans-overflow.
2015/06/18 11:10:03:300 [ERROR] [Executor task launch worker-5] org.apache.spark.Logging$class.logError:96 - Exception in task 0.0 in stage 63.0 (TID 31500)
java.lang.StackOverflowError
at com.massivedatascience.divergence.SquaredEuclideanDistanceDivergence$.convexHomogeneous (BregmanDivergence.scala:144)
at com.massivedatascience.clusterer.NonSmoothedPointCenterFactory$class.toPoint(BregmanPointO ps.scala:209)
at com.massivedatascience.clusterer.SquaredEuclideanPointOps$.toPoint(BregmanPointOps.scala:260)
at com.massivedatascience.clusterer.KMeansPredictor$class.predictWeighted(KMeansModel.scala:66)
at com.massivedatascience.clusterer.KMeansModel.predictWeighted(KMeansModel.scala:99)
This works on the MLLib kmeans implementation, however switching to massive-kmeans gives the following stackoverflowerror:
(you can switch between import statements MLLib/massivedatascience in the scala file to see the difference)
mvplove123 commented
did you solve this problem?
mvplove123 commented
I have solved this problem ,you should add more size for stack ,like this
--conf spark.executor.extraJavaOptions=-Xss100m ,it's work