Decision Tree

Example code and own notes while taking the course "Intro to Machine Learning" on Udacity.

Entropy

Controls how a decision tree where to split the data.

Definition: Measure of impurity in a bunch of examples.

Pi: Fraction of examples in class

⭐️ All examples are same class 👉 Entropy = 0

⭐️ Examples are evenly split between classes 👉Entropy = 1.0

Information gain = entropy(parent) - [weighted average] * entropy(children)

The question is: Where to make that split?

Decision tree algorithm : MAXIMIZE INFORMATION GAIN

This is how we will choose the feature that split on.

High bias is not so good. There should be a sweet balance between bias and variance.

High bias practically ignores the data. Almost no capacity to learn anything.

You can build bigger classifiers out of decision trees in something called ensemble methods.

Prone to overfitting. If you have data that lots and lots of features and a complicated decision tree it can overfit tha data. Have to be careful with the parameter tunes that you're picking.