multiple calling of LOF from Scala leads to AbortException: DBID range allocation error
edouardfouche opened this issue · 2 comments
Hello,
I am using the LOF implementation from Elki for some experiments. Since I work in scala, I have written a wrapper to call it from the jar (version 0.7.2). The wrapper is the following:
import de.lmu.ifi.dbs.elki.algorithm.outlier.lof.LOF
import de.lmu.ifi.dbs.elki.database.StaticArrayDatabase
import de.lmu.ifi.dbs.elki.datasource.ArrayAdapterDatabaseConnection
import de.lmu.ifi.dbs.elki.distance.distancefunction.minkowski
import scala.collection.mutable.ListBuffer
/**
* Created by fouchee on 04.09.17.
*/
case class ElkiLOF(k: Int) extends OutlierDetector {
def computeScores(instances: Array[Array[Double]]): Array[(Int, Double)] = {
val distance = new minkowski.EuclideanDistanceFunction
val lof = new LOF(k, distance)
val dbc = new ArrayAdapterDatabaseConnection(instances) // Adapter to load data from an existing array.
val db = new StaticArrayDatabase(dbc, null) // Create a database (which may contain multiple relations!)
db.initialize()
val result = lof.run(db).getScores()
var scoreList = new ListBuffer[Double]()
val DBIDs = result.iterDBIDs()
while ( {
DBIDs.valid
}) {
scoreList += result.doubleValue(DBIDs)
DBIDs.advance
}
scoreList
val corrected = scoreList.map {
case d if d.isNaN => 1.0 // Or whatever value you'd prefer.
case d if d.isNegInfinity => 1.0 // Or whatever value you'd prefer.
case d if d.isPosInfinity => 1.0 // Or whatever value you'd prefer.
case d => d
}
corrected.toArray.zipWithIndex.map(x => (x._2, x._1))
}
}
It works well, and the implementation is really fast I must say. However, if I run it multiple times in parallel (and I mean with a lot of different data sets and repetition) at some point I run into the following error:
[error] Exception in thread "main" de.lmu.ifi.dbs.elki.utilities.exceptions.AbortException: DBID range allocation error - too many objects allocated!
[error] at de.lmu.ifi.dbs.elki.database.ids.integer.TrivialDBIDFactory.generateStaticDBIDRange(TrivialDBIDFactory.java:72)
[error] at de.lmu.ifi.dbs.elki.database.ids.DBIDUtil.generateStaticDBIDRange(DBIDUtil.java:196)
[error] at de.lmu.ifi.dbs.elki.database.StaticArrayDatabase.initialize(StaticArrayDatabase.java:129)
[error] at com.edouardfouche.detectors.ElkiLOF$.computeScores(ElkiLOF.scala:29)
I don't know how to correct this error. When I run in sequential, it occurs at some point as well. It seems that the underlying TrivialDBIDFactory does not deallocate the DBIDs that are not in use anymore. I found here the piece of code that launch the error http://www.massapi.com/class/de/lmu/ifi/dbs/elki/utilities/exceptions/AbortException-5.html
Any idea how to avoid that?
Thank you,
Edouard
It runs out of unique object ids. In total, you supposedly ran it on more than 2^31 objects.
The current class will assign every object ID only once. This serves as a safety guard to prevent mixing up IDs from different data sets.
If you use the ArrayAdapterDatabaseConnection
constructor with a startid=0
, then this will not happen.
ELKI will be even faster if you add an index, for example a cover tree (on large enough data). An example (using the newer ELKIBuilder
API, and a R*-tree index) can be found here: https://github.com/elki-project/elki/blob/master/addons/tutorial/src/main/java/tutorial/javaapi/GeoIndexing.java
There is EuclideanDistanceFunction.STATIC
, as this distance does not have parameters.
From a LOF logic, infinite LOF is an outlier. This occurs when you have minpts duplicates (at distance 0, and thus 'density' infinity), and a nearby point.
Closing, because it is behaving as intended, and there is a constructor of ArrayAdapterDatabaseConnection
to solve your problem.
Grüße aus Heidelberg.
Thank you very much !