This code is an implementation of a Decision Tree classifier from scratch using Python. Here's what each part of the code does:
-
Entropy Calculation (
entropy
function):- Calculates the entropy of a given set of labels (
y
) using the formula for entropy in information theory.
- Calculates the entropy of a given set of labels (
-
Node Class:
- Represents nodes in the decision tree.
- Stores information about each node, including the feature to split on, threshold value, child nodes (left and right), and value (if the node is a leaf node).
-
DecisionTree Class:
- Initializes and builds the decision tree.
- Contains methods to fit the tree to the training data (
fit
), predict labels for new data (predict
), and grow the tree recursively (_grow_tree
).
-
fit
Method:- Grows the tree from the root node using the
_grow_tree
method.
- Grows the tree from the root node using the
-
_grow_tree
Method:- Recursively builds the decision tree based on conditions like maximum depth, minimum samples to split a node, and number of unique labels.
- Conducts a greedy search to find the best feature and threshold to split on, creating child nodes accordingly.
-
Other Helper Methods:
_best_criteria
: Determines the best feature and threshold to split on based on information gain._information_gain
: Calculates the information gain using entropy._split
: Splits the data based on a threshold value._most_common_label
: Finds the most common label in a given set.
-
predict
Method:- Uses the trained decision tree to predict labels for new data by traversing the tree.
-
_traverse_tree
Method:- Traverses the tree recursively to predict the label for a single data point.
This implementation constructs a decision tree by recursively choosing splits that maximize information gain, using entropy as a measure of impurity. The tree is then used for prediction based on the learned structure.