- problem: decide which color I like
- data size: 200
- feature:
- R - red
- G - green
- B - blue
- range from 0 to 255
- rule 0: 60 < R < 155 and G < 100 and B > 160 ===> yes
- training: 60%
- testing: 40%
Step 3: Compare the rules in the decision tree from Step 2 and the rules used to generate the "absolutely right" data
- rule 0: 60 < R < 155 and G < 100 and B > 160 ===> yes
- rule 1: R ≤ 54 and G ≤ 118.5 and B ≤ 164.5 ===> no
- rule 2: B ≤ 160.5 ===> no
- rule 1: similar to absolutely right rule
- rule 2: far from absolutely right rule
- result: only 97% accuracy
- guess: feature overlapping in label 0 (as you see in rule 2)
- Decision Tree
96.25%
- Random Forest
98.75%
- Extra Tree
93.75%
- Multinomial NB
92.50%
- QuadraticDiscriminantAnalysis
97.50%
- Linear
97.50%
- SVR with ploy kernel
63.75%
- SVR with rbf kernel
53.75%
- Random Forest > Decision Tree, because forest > tree
- Extra tree is unstable, but usually lose to decision tree
- NB is not suitable for this kind of data
- Linear regression model play well with this problem
- Other regression model fail with this problem