/Decision-Tree

Python implementation of decision tree

Primary LanguageJupyter Notebook

Decision-Tree

Python implementation of decision tree (algorithm ID3)

File Structure

kmeans.py: k means for 1d data

entropy.py: caculate entropy, conditional entropy, empirical entropy and empirical conditional entropy

decision_tree.py: node and decision tree

You can visualize something in the jupyter notebook file visualization.ipynb

Dataset

Iris dataset from UCI machine learning repository

Usage

python main.py

you can expect to get the following output in the terminal:

Train: 117 samples, Test: 33
Attribute sepal_length has been discretized
Attribute sepal_width has been discretized
Attribute petal_length has been discretized
Attribute petal_width has been discretized


graph TB
root--root-->a(petal width)
a(petal width)--petal_width<0.7837-->a0(Iris-setosa)
a(petal width)--0.7837<=petal_width<1.6794-->a1(sepal length)
a1(sepal length)--sepal_length<5.7058-->a10(Iris-versicolor)
a1(sepal length)--sepal_length>=5.7058-->a11(sepal width)
a11(sepal width)--sepal_width<2.8646-->a110(petal length)
a110(petal length)--petal_length>=3.1637-->a1100(Iris-versicolor)
a11(sepal width)--2.8646<=sepal_width<3.5024-->a111(petal length)
a111(petal length)--petal_length>=3.1637-->a1110(Iris-versicolor)
a(petal width)--petal_width>=1.6794-->a2(sepal length)
a2(sepal length)--sepal_length<5.7058-->a20(Iris-virginica)
a2(sepal length)--sepal_length>=5.7058-->a21(sepal width)
a21(sepal width)--sepal_width<2.8646-->a210(Iris-virginica)
a21(sepal width)--2.8646<=sepal_width<3.5024-->a211(petal length)
a211(petal length)--petal_length>=3.1637-->a2110(Iris-virginica)
a21(sepal width)--sepal_width>=3.5024-->a212(Iris-virginica)

graph TB
root--root-->a(petal width)
a(petal width)--petal_width<0.7837-->a0(Iris-setosa)
a(petal width)--0.7837<=petal_width<1.6794-->a1(Iris-versicolor)
a(petal width)--petal_width>=1.6794-->a2(Iris-virginica)

Predict in test data
     sepal_length  sepal_width  petal_length  petal_width            class       pred_class
6             4.6          3.4           1.4          0.3      Iris-setosa      Iris-setosa
18            5.7          3.8           1.7          0.3      Iris-setosa      Iris-setosa
20            5.4          3.4           1.7          0.2      Iris-setosa      Iris-setosa
24            4.8          3.4           1.9          0.2      Iris-setosa      Iris-setosa
25            5.0          3.0           1.6          0.2      Iris-setosa      Iris-setosa
27            5.2          3.5           1.5          0.2      Iris-setosa      Iris-setosa
36            5.5          3.5           1.3          0.2      Iris-setosa      Iris-setosa
37            4.9          3.1           1.5          0.1      Iris-setosa      Iris-setosa
38            4.4          3.0           1.3          0.2      Iris-setosa      Iris-setosa
52            6.9          3.1           4.9          1.5  Iris-versicolor  Iris-versicolor
61            5.9          3.0           4.2          1.5  Iris-versicolor  Iris-versicolor
63            6.1          2.9           4.7          1.4  Iris-versicolor  Iris-versicolor
64            5.6          2.9           3.6          1.3  Iris-versicolor  Iris-versicolor
66            5.6          3.0           4.5          1.5  Iris-versicolor  Iris-versicolor
69            5.6          2.5           3.9          1.1  Iris-versicolor  Iris-versicolor
71            6.1          2.8           4.0          1.3  Iris-versicolor  Iris-versicolor
80            5.5          2.4           3.8          1.1  Iris-versicolor  Iris-versicolor
133           6.3          2.8           5.1          1.5   Iris-virginica  Iris-versicolor
102           7.1          3.0           5.9          2.1   Iris-virginica   Iris-virginica
103           6.3          2.9           5.6          1.8   Iris-virginica   Iris-virginica
105           7.6          3.0           6.6          2.1   Iris-virginica   Iris-virginica
107           7.3          2.9           6.3          1.8   Iris-virginica   Iris-virginica
112           6.8          3.0           5.5          2.1   Iris-virginica   Iris-virginica
113           5.7          2.5           5.0          2.0   Iris-virginica   Iris-virginica
114           5.8          2.8           5.1          2.4   Iris-virginica   Iris-virginica
120           6.9          3.2           5.7          2.3   Iris-virginica   Iris-virginica
122           7.7          2.8           6.7          2.0   Iris-virginica   Iris-virginica
124           6.7          3.3           5.7          2.1   Iris-virginica   Iris-virginica
131           7.9          3.8           6.4          2.0   Iris-virginica   Iris-virginica
139           6.9          3.1           5.4          2.1   Iris-virginica   Iris-virginica
140           6.7          3.1           5.6          2.4   Iris-virginica   Iris-virginica
141           6.9          3.1           5.1          2.3   Iris-virginica   Iris-virginica
146           6.3          2.5           5.0          1.9   Iris-virginica   Iris-virginica

copy the mermaid code to markdown, and you would expect to get the following tree:

graph LR
root--root-->a(house)
a(house)--house==0-->a0(work)
a0(work)--work==0-->a00('0')
a0(work)--work==1-->a01('1')
a(house)--house==1-->a1('1')

graph TB
root--root-->a(petal width)
a(petal width)--petal_width<0.7837-->a0(Iris-setosa)
a(petal width)--0.7837<=petal_width<1.6794-->a1(sepal length)
a1(sepal length)--sepal_length<5.7058-->a10(Iris-versicolor)
a1(sepal length)--sepal_length>=5.7058-->a11(sepal width)
a11(sepal width)--sepal_width<2.8646-->a110(petal length)
a110(petal length)--petal_length>=3.1637-->a1100(Iris-versicolor)
a11(sepal width)--2.8646<=sepal_width<3.5024-->a111(petal length)
a111(petal length)--petal_length>=3.1637-->a1110(Iris-versicolor)
a(petal width)--petal_width>=1.6794-->a2(sepal length)
a2(sepal length)--sepal_length<5.7058-->a20(Iris-virginica)
a2(sepal length)--sepal_length>=5.7058-->a21(sepal width)
a21(sepal width)--sepal_width<2.8646-->a210(Iris-virginica)
a21(sepal width)--2.8646<=sepal_width<3.5024-->a211(petal length)
a211(petal length)--petal_length>=3.1637-->a2110(Iris-virginica)
a21(sepal width)--sepal_width>=3.5024-->a212(Iris-virginica)
graph TB
root--root-->a(petal width)
a(petal width)--petal_width<0.7837-->a0(Iris-setosa)
a(petal width)--0.7837<=petal_width<1.6794-->a1(Iris-versicolor)
a(petal width)--petal_width>=1.6794-->a2(Iris-virginica)

Reference

  • Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  • 李航et al. 统计学习方法. Qing hua da xue chu ban she, 2012.