/decision-tree-information-gain-visualizer

generate intermediate calculation for information gain algorithm

Primary LanguageJupyter Notebook

Generate intermediate calculation for information gain algorithm

given table:

X Y Z Class
1 1 1 A
1 1 0 A
0 0 1 B
1 0 0 B

output:

G is shorthand for Gain, E is shorthand for Entropy

Information gain of splitting X:
G(X)= E(Class) - E(X, Class)
    = 1 - ( p(X=0)E(0, 1)+ p(X=1)E(2, 1))
    = 1 - ( 0.75 × 0+ 0.75×(-0.667 × log(0.667) - 0.333 × log(0.333)))
    = 0.31127812445913283

Information gain of splitting Y:
G(Y)= E(Class) - E(Y, Class)
    = 1 - ( p(Y=0)E(0, 2)+ p(Y=1)E(2, 0))
    = 1 - ( 0.5 × 0+ 0.5×0)
    = 1

Information gain of splitting Z:
G(Z)= E(Class) - E(Z, Class)
    = 1 - ( p(Z=0)E(1, 1)+ p(Z=1)E(1, 1))
    = 1 - ( 0.5 × (-0.5 × log(0.5) - 0.5 × log(0.5))+ 0.5×(-0.5 × log(0.5) - 0.5 × log(0.5)))
    = 0.0