crim-ca/patchwork

Exception in thread "main" java.lang.StackOverflowError

Opened this issue · 4 comments

I got StackOverflowError when tested on such data. But I tested on 400K points (not 1M as in link).

Parameters was:

// PatchWork parameters
val epsilon = Array(30.1, 30.1)
val minPts = 1
val minCellInCluster = 10
val ratio = 0.0

Hello,

I had no problem reading your data like that:
val dataRDD = sc.textFile("9_1M.csv").map(_.split(",")).map(s => Array(s(0).toDouble, s(1).toDouble)).cache

Did you have a RDD[Array[Double]]?

Yes. I almost did not change anything in the code.
I only add .setMaster("local[4]") in SparkContext

Here is RDD:
// Reading and parsing Data val dataRDD: RDD[Array[Double]] = sc.textFile("datasets/9_1M.csv") .map(_.split(",")).map(s => Array(s(0).toDouble, s(1).toDouble)).cache

May be you use another code. Download latest from github and try to run.

Full error list:
Exception in thread "main" java.lang.StackOverflowError at scala.collection.SeqLike$class.size(SeqLike.scala:106) at scala.collection.AbstractSeq.size(Seq.scala:40) at scala.collection.mutable.Builder$class.sizeHint(Builder.scala:69) at scala.collection.mutable.ArrayBuffer.sizeHint(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:240) at scala.collection.TraversableLike$class.map(TraversableLike.scala:243) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at scala.Array$.concat(Array.scala:243) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$getNearCell$2.apply(PatchWork.scala:164) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$getNearCell$2.apply(PatchWork.scala:164) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at ca.crim.spark.mllib.clustering.PatchWork.getNearCell(PatchWork.scala:164) at ca.crim.spark.mllib.clustering.PatchWork.innerCells(PatchWork.scala:189) at ca.crim.spark.mllib.clustering.PatchWork.expandCluster(PatchWork.scala:205) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$expandCluster$1.apply(PatchWork.scala:220) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$expandCluster$1.apply(PatchWork.scala:205) at scala.collection.immutable.List.foreach(List.scala:318) at ca.crim.spark.mllib.clustering.PatchWork.expandCluster(PatchWork.scala:205) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$expandCluster$1.apply(PatchWork.scala:220) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$expandCluster$1.apply(PatchWork.scala:205) at scala.collection.immutable.List.foreach(List.scala:318) at ca.crim.spark.mllib.clustering.PatchWork.expandCluster(PatchWork.scala:205) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$expandCluster$1.apply(PatchWork.scala:220) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$expandCluster$1.apply(PatchWork.scala:205)

/ / / // Repeats many times:
at scala.collection.immutable.List.foreach(List.scala:318) at ca.crim.spark.mllib.clustering.PatchWork.expandCluster(PatchWork.scala:205) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$expandCluster$1.apply(PatchWork.scala:220) at ca.crim.spark.mllib.clustering.PatchWork$$anonfun$expandCluster$1.apply(PatchWork.scala:205)

I'have got the same exception in the same recursive function with a 60 million 2d points dataset with the following parameters:

  • eps=20
  • minPoints=20
  • minCellInCluster=0
  • ratio=0

Have you resolved it in some way? @Mignastor can a tail recursion resolve the problem in your opinion?