Refer to the Code-Structure document for an overview of the code structure.
The main functions in the package are demonstrated below.
First build a simple gbm with the following code;
library(h2o)
library(h2oTreeHelpR)
h2o.no_progress()
h2o.init()
prostate.hex = h2o.uploadFile(path = system.file("extdata",
"prostate.csv",
package = "h2o"),
destination_frame = "prostate.hex")
prostate.hex["RACE"] = as.factor(prostate.hex["RACE"])
prostate.hex["DPROS"] = as.factor(prostate.hex["DPROS"])
expl_cols <- c("AGE", "RACE", "DPROS", "DCAPS", "PSA", "VOL", "GLEASON")
prostate.gbm = h2o.gbm(x = expl_cols,
y = "CAPSULE",
training_frame = prostate.hex,
ntrees = 3,
max_depth = 1,
learn_rate = 0.1)
prostate.gbm
## Model Details:
## ==============
##
## H2ORegressionModel: gbm
## Model ID: GBM_model_R_1538296273962_40
## Model Summary:
## number_of_trees number_of_internal_trees model_size_in_bytes min_depth
## 1 3 3 261 1
## max_depth mean_depth min_leaves max_leaves mean_leaves
## 1 1 1.00000 2 2 2.00000
##
##
## H2ORegressionMetrics: gbm
## ** Reported on training data. **
##
## MSE: 0.2193848
## RMSE: 0.4683854
## MAE: 0.4565917
## RMSLE: 0.3295538
## Mean Residual Deviance : 0.2193848
h2o_tree_dfs = h2o_tree_convertR(h2o_model = prostate.gbm)
h2o_tree_dfs[[1]]
## node node_text predictions
## 1 SG_0_Node_0 [shape=box, fontsize=14, label=GLEASON < 6.5] NA
## 2 SG_0_Node_3 [fontsize=14, label=-0.019210527] -0.01921053
## 3 SG_0_Node_4 [fontsize=14, label=0.023479532] 0.02347953
## node_text_label left_split right_split left_split_levels
## 1 GLEASON < 6.5 SG_0_Node_3 SG_0_Node_4 [NA]|<
## 2 -0.019210527 <NA> <NA> <NA>
## 3 0.023479532 <NA> <NA> <NA>
## right_split_levels NA_direction node_variable_type split_column
## 1 >= left numeric GLEASON
## 2 <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA>
## node_split_point
## 1 6.5
## 2 NA
## 3 NA
terminal_node_rules <- extract_split_rules(h2o_tree_dfs)
terminal_node_rules[[1]]
## terminal_node terminal_node_depth terminal_node_path
## 1 SG_0_Node_3 1 SG_0_Node_0.SG_0_Node_3
## 2 SG_0_Node_4 1 SG_0_Node_0.SG_0_Node_4
## terminal_node_directions
## 1 (GLEASON < 6.5 | is.na(GLEASON))
## 2 GLEASON >= 6.5
Specifically this a mapping between terminal node paths represented as i. L(eft) or R(ight) directions at each node (output from h2o.predict_lead_node_assignment) and ii. variable conditions (i.e. A > x & B in (y, z) etc).
terminal_node_mapping <- map_h2o_encoding(h2o_tree_dfs)
terminal_node_mapping[[1]]
## terminal_node terminal_node_depth terminal_node_path
## 1 SG_0_Node_3 1 SG_0_Node_0.SG_0_Node_3
## 2 SG_0_Node_4 1 SG_0_Node_0.SG_0_Node_4
## terminal_node_directions terminal_node_directions_h2o
## 1 (GLEASON < 6.5 | is.na(GLEASON)) L
## 2 GLEASON >= 6.5 R
The last column terminal_node_directions_h2o shows the terminal nodes as they are represented in the output of the h2o.predict_lead_node_assignment function.