ReForeSt is a distributed, scalable implementation of the RF learning algorithm which targets fast and memory efficient processing. ReForeSt main contributions are manifold: (i) it provides a novel approach for the RF implementation in a distributed environment targeting an in-memory efficient processing, (ii) it is faster and more memory efficient with respect to the de facto standard MLlib, (iii) the level of parallelism is self-configuring.
An already packaged ReForeSt in zip or tar.gz format can be found in the directory "resources/package". Otherwise it is possible to build ReForeSt using Maven:
mvn clean package
import reforest.rf.{RFProperty, RFRunner}
// Create the ReForeSt configuration.
val property = RFParameterBuilder.apply
.addParameter(RFParameterType.Dataset, "data/test10k-labels")
.addParameter(RFParameterType.NumFeatures, 794)
.addParameter(RFParameterType.NumClasses, 10)
.addParameter(RFParameterType.NumTrees, 100)
.addParameter(RFParameterType.Depth, 10)
.addParameter(RFParameterType.BinNumber, 32)
.addParameter(RFParameterType.SparkMaster, "local[4]")
.addParameter(RFParameterType.SparkCoresMax, 4)
.addParameter(RFParameterType.SparkPartition, 4 * 4)
.addParameter(RFParameterType.SparkExecutorMemory, "4096m")
.addParameter(RFParameterType.SparkExecutorInstances, 1)
.build
val sc = CCUtil.getSparkContext(property)
// Create the Random Forest classifier.
val timeStart = System.currentTimeMillis()
val rfRunner = ReForeStTrainerBuilder.apply(property).build(sc)
// Train a Random Forest model.
val model = rfRunner.trainClassifier()
val timeEnd = System.currentTimeMillis()
// Evaluate model on test instances and compute test error
val labelAndPreds = rfRunner.getDataLoader.getTestingData.map { point =>
val prediction = model.predict(point.features)
(point.label, prediction)
}
val testErr = labelAndPreds.filter(r => r._1 != r._2).count.toDouble / rfRunner.getDataLoader.getTestingData.count()
println("Accuracy: "+(1 - testErr))
println("Time: " + (timeEnd - timeStart))
rfRunner.sparkStop()
To quickly start using ReForeSt we provide a pre-built Maven project with all the settings and configuration to automatically import the project in IntelliJ. The prebuilt-project can be found in "resources/package" in zip and tar.gz format.
ReForeSt has been developed at Smartlab - DIBRIS - University of Genoa.