Forest Construction Hangs
Yesse42 opened this issue · 1 comments
Hello. I was attempting to build a decision tree using Float32 data and construction seems to hang for certain input data. Here's a small example.
using DecisionTree
#Tree construction does not seem to hang for Float64s with this array, but does for Float32s and 16s.
#I have had hangs with Float64's with different data.
indep=Float32.([ 9.4 9.4 1.1
9.4 9.4 -0.0
9.4 9.4 1.9
9.4 9.4 1.4
9.4 9.4 1.1
9.4 9.4 0.0])
dep=Float32.([ -0.4
-0.2
-1.1
0.0
0.0
0.0])
#The decision tree construction hangs for 9.4, -1.0, and 15.6, but not for 2.0 or 2.5??
indep[indep.≈9.4] .= 15.6
display(dep)
display(indep)
#This occasionlly doesn't hang the first time, but it has always done so on the second run.
build_forest(dep, indep, size(indep, 2), 10, 0.7)
When I managed to keyboard interrupt this in the REPL it seemed to be getting stuck in some threading situation.
Here are the versions+hardware I'm using
DecisionTree: v0.10.11
Julia: v"1.6.3" for Intel Mac (downloaded as a binary from the Julia Website) running through Rosetta 2
Computer: MacBookAir with M1 Chip.
I also downloaded Julia1.7 for Intel and AArch64, and got the same hang.
I got around to looking at what the problem was in VSCode's debugger. The problem does not seem to be multithreading, as I removed all Threads.@threads and it persisted. Instead, it appears that the tree's depth somehow continues to grow indefinitely, with new splits only to the left. Using the same matrices as above, this is one of the resulting trees which would cause a hang if I did not set max_depth at 10. Each time the tree where this malfunction occurs is different; sometimes it doesn't happen at all.
julia> forest = build_forest(dep, indep, size(indep, 2), 4, 0.7, 10); forest.trees
4-element Vector{Union{Leaf{Float32}, Node{Float32, Float32}}}:
Decision Tree
Leaves: 2
Depth: 1
Decision Tree
Leaves: 2
Depth: 1
Decision Tree
Leaves: 11
Depth: 10
Decision Leaf
Majority: 0.0
Samples: 4
julia> pathological=forest.trees[3]; print_tree(pathological)
Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> Feature 3, Threshold 0.0
L-> 0.0 : 2/3
R-> 0.0 : 0/0
R-> 0.0 : 0/0
R-> 0.0 : 0/0
R-> 0.0 : 0/0
R-> 0.0 : 0/0
R-> 0.0 : 0/0
R-> 0.0 : 0/0
R-> 0.0 : 0/0
R-> 0.0 : 0/0
R-> -1.1 : 1/1
It seems to be endlessly splitting to the left on feature 3 at the same threshold of 0 each time.