Version: 1.0.0
API Scaladoc: Reducer
Small facility which reduces naive decision tree models as produced by mllib.
Mllib produces trees for which some nodes have two leafs predicting the same label, for instance:
If (feature 5 > 8.0)
If (feature 2 <= 4473.989772796631)
If (feature 0 in {4.0})
Predict: 0.0
Else (feature 0 not in {4.0})
Predict: 0.0
Else (feature 2 > 4473.989772796631)
If (feature 3 <= 126.0)
Predict: 0.0
Else (feature 3 > 126.0)
Predict: 2.0
which can be optimized (simplified) for projects applying this decision tree model at large scale and needing high performance (less nodes means less conditions and thus less cpu usage):
If (feature 5 > 8.0)
If (feature 2 <= 4473.989772796631)
Predict: 0.0
Else (feature 2 > 4473.989772796631)
If (feature 3 <= 126.0)
Predict: 0.0
Else (feature 3 > 126.0)
Predict: 2.0
This is particularly helpfull for decision trees with fewer possible labels such as a true/false classification tree.
For additional details: API Scaladoc: Reducer
import org.apache.spark.mllib.tree.Reducer
import org.apache.spark.mllib.tree.model.DecisionTreeModel // spark mllib
val reducedDecisionTreeModel: DecisionTreeModel = Reducer.mergeLeafs(naiveDecisionTreeModel)
With sbt, just add this one line to your build.sbt:
libraryDependencies += "mllib_decision_tree_reducer" % "mllib_decision_tree_reducer" % "1.0.0" from "https://github.com/xavierguihot/mllib_decision_tree_reducer/releases/download/v1.0.0/mllib_decision_tree_reducer-1.0.0.jar"
With sbt:
sbt assembly