TooManyEvaluationsException
Opened this issue · 2 comments
I'm building an POC for sales forecast and getting an exception that I can't get rid of.
org.apache.commons.math3.exception.TooManyEvaluationsException: illegal state: maximal count (100.000) exceeded: evaluations
at org.apache.commons.math3.optim.BaseOptimizer$MaxEvalCallback.trigger(BaseOptimizer.java:242)
at org.apache.commons.math3.util.Incrementor.incrementCount(Incrementor.java:155)
at org.apache.commons.math3.optim.BaseOptimizer.incrementEvaluationCount(BaseOptimizer.java:191)
at org.apache.commons.math3.optim.nonlinear.scalar.MultivariateOptimizer.computeObjectiveValue(MultivariateOptimizer.java:114)
at org.apache.commons.math3.optim.nonlinear.scalar.LineSearch$1.value(LineSearch.java:120)
at org.apache.commons.math3.optim.univariate.UnivariateOptimizer.computeObjectiveValue(UnivariateOptimizer.java:149)
at org.apache.commons.math3.optim.univariate.BrentOptimizer.doOptimize(BrentOptimizer.java:225)
at org.apache.commons.math3.optim.univariate.BrentOptimizer.doOptimize(BrentOptimizer.java:43)
at org.apache.commons.math3.optim.BaseOptimizer.optimize(BaseOptimizer.java:153)
at org.apache.commons.math3.optim.univariate.UnivariateOptimizer.optimize(UnivariateOptimizer.java:70)
at org.apache.commons.math3.optim.nonlinear.scalar.LineSearch.search(LineSearch.java:130)
at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.doOptimize(NonLinearConjugateGradientOptimizer.java:282)
at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.doOptimize(NonLinearConjugateGradientOptimizer.java:46)
at org.apache.commons.math3.optim.BaseOptimizer.optimize(BaseOptimizer.java:153)
at org.apache.commons.math3.optim.BaseMultivariateOptimizer.optimize(BaseMultivariateOptimizer.java:65)
at org.apache.commons.math3.optim.nonlinear.scalar.MultivariateOptimizer.optimize(MultivariateOptimizer.java:63)
at org.apache.commons.math3.optim.nonlinear.scalar.GradientMultivariateOptimizer.optimize(GradientMultivariateOptimizer.java:73)
at org.apache.commons.math3.optim.nonlinear.scalar.gradient.NonLinearConjugateGradientOptimizer.optimize(NonLinearConjugateGradientOptimizer.java:244)
at com.cloudera.sparkts.models.ARIMA$.fitWithCSSCGD(ARIMA.scala:198)
at com.cloudera.sparkts.models.ARIMA$.fitModel(ARIMA.scala:107)
at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2$$anonfun$apply$2.apply(SalesForecastTrainerPOC.scala:94)
at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2$$anonfun$apply$2.apply(SalesForecastTrainerPOC.scala:90)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2.apply(SalesForecastTrainerPOC.scala:90)
at com.hivemindtechnologies.ms.SalesForecastTrainerPOC$$anonfun$main$2.apply(SalesForecastTrainerPOC.scala:89)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
TimeSeriesRDD will be built via
val salesTimeSeriesRDD = TimeSeriesRDD.timeSeriesRDDFromObservations(dateTimeIndex, salesDF, "soldAt", "productId", "quantity")
val keyWithModelRDD = salesTimeSeriesRDD map {
case (key, tsVector) => (key, ARIMA.fitModel(1, 0, 1, tsVector)) // exception
}
If the DataFrame contains for one key only everything works fine.
Series Data looks like that and as of now contains 251 values per key
[NaN,337.0,27.0,242.0,226.0,142.0,252.0,215.0,280.0,1.0,437.0,338.0,403.0,840.0,723.0,1129.0,768.0,208.0,177.0,238.0,275.0,307.0,13.0,201.0,383.0,220.0,230.0,303.0,476.0,9.0,655.0,424.0,414.0,414.0,319.0,330.0,1.0,202.0,127.0,118.0,135.0,167.0,342.0,5.0,256.0,204.0,188.0,189.0,249.0,358.0,NaN,165.0,105.0,83.0,106.0,141.0,229.0,1.0,171.0,109.0,85.0,131.0,176.0,319.0,27.0,172.0,168.0,152.0,136.0,161.0,274.0,25.0,166.0,146.0,155.0,321.0,366.0,436.0,16.0,368.0,244.0,242.0,200.0,0.0,296.0,0.0,157.0,188.0,146.0,202.0,174.0,15.0,131.0,158.0,164.0,181.0,199.0,262.0,20.0,196.0,152.0,137.0,122.0,177.0,305.0,6.0,498.0,159.0,119.0,127.0,144.0,240.0,6.0,153.0,108.0,100.0,105.0,134.0,172.0,146.0,209.0,157.0,0.0,271.0,277.0,12.0,275.0,178.0,187.0,222.0,291.0,356.0,182.0,102.0,117.0,152.0,185.0,0.0,474.0,549.0,578.0,226.0,695.0,547.0,9.0,386.0,315.0,278.0,253.0,315.0,328.0,1.0,61.0,34.0,41.0,93.0,118.0,195.0,NaN,191.0,144.0,106.0,113.0,151.0,272.0,1.0,142.0,103.0,103.0,263.0,161.0,263.0,NaN,258.0,283.0,301.0,390.0,388.0,588.0,5.0,440.0,399.0,348.0,267.0,310.0,443.0,1.0,310.0,190.0,218.0,274.0,343.0,409.0,0.0,245.0,88.0,139.0,146.0,178.0,244.0,NaN,191.0,154.0,124.0,135.0,148.0,189.0,1.0,216.0,234.0,210.0,216.0,262.0,315.0,0.0,258.0,252.0,183.0,232.0,264.0,425.0,NaN,366.0,370.0,358.0,374.0,355.0,547.0,0.0,561.0,437.0,339.0,323.0,360.0,483.0,3.0,452.0,330.0,354.0,148.0,140.0,192.0,9.0,220.0,166.0,214.0,184.0,213.0,329.0,2.0,236.0]
When eliminating the NaN's models for some keys can be generated but forecasts crashes some times
It seems that this is an concurrency problem. If I collect the RDDs data to the driver and run the model training in sequence via
salesTimeSeriesRDD collect() foreach {
case (key, tsVector) =>
val model = ARIMA.fitModel(5, 0, 1, tsVector)
val foreacasts = model.forecast(Vectors.dense(Array.emptyDoubleArray), 40)
log.info(s"foreacasts for $key[${foreacasts.size}]: $foreacasts")
}
everything works fine.
The problem is that the code dives really quick into mathematic stuff via apaches math libs.
To Be honest: I'm no expert in ML/linear Algebra, but are familiar with spark.
I just wanted to use this lib to implement a proof of concept for timeseries forecasting.
How did u solve this problem ?