/ML_practice

Primary LanguageJupyter Notebook

  1. DECISION TREE (ML)

A decision tree is a popular machine learning algorithm used for both classification and regression tasks. It models decisions based on a tree-like structure where each internal node represents a decision based on a specific feature, each branch represents the outcome of the decision, and each leaf node represents the final predicted output or target value. Decision trees are easy to interpret and visualize, making them a valuable tool in various domains.

Let's go through a simple example of a decision tree for a binary classification problem. Consider a dataset of weather conditions and whether people play tennis or not. The features include Outlook (Sunny, Overcast, Rainy), Temperature (Hot, Mild, Cool), Humidity (High, Normal), and Wind (Weak, Strong). The target variable is whether people play tennis (Yes or No).

Here's a small part of the dataset:

Outlook Temperature Humidity Wind Play Tennis
Sunny Hot High Weak No
Overcast Hot Normal Weak Yes
Rainy Mild High Weak Yes
Sunny Mild Normal Strong Yes
Rainy Cool Normal Strong No
Overcast Cool High Weak Yes
Sunny Mild High Strong No
Now, let's build a decision tree based on this dataset:

Choose the Root Node: The feature that provides the best split is chosen as the root node. We calculate this based on metrics like Information Gain or Gini Impurity. For simplicity, let's assume we use the Outlook feature. mathematica Copy code Root Node: Outlook

Create Child Nodes: For each unique value in the chosen feature (Outlook), create a branch and a child node. Root Node: Outlook | ├── Sunny ├── Overcast └── Rainy Continue Splitting:

For each child node, choose the next best feature for splitting. Repeat this process until each branch leads to a leaf node with a final decision (Play Tennis: Yes/No). Root Node: Outlook | ├── Sunny | ├── Humidity | | ├── High: No | | └── Normal: Yes | ├── Overcast: Yes └── Rainy ├── Wind | ├── Weak: Yes | └── Strong: No This decision tree can now be used to predict whether people will play tennis based on given weather conditions. If you follow a path from the root node to a leaf, you'll find the predicted outcome. For example:

If Outlook is Sunny and Humidity is High, the prediction is "No." If Outlook is Rainy and Wind is Strong, the prediction is "No." If Outlook is Overcast, the prediction is "Yes." This is a simplified example, and in real-world scenarios, decision trees can become more complex with additional features and more sophisticated splitting criteria.

STEPS OF DECISION TREE : Building a decision tree involves several steps. Here is a general overview of the key steps in constructing a decision tree:

Selecting the Root Node: Choose the feature that provides the best split based on a certain criterion (e.g., Information Gain, Gini Impurity). This feature will be the root node of the decision tree.

Creating Child Nodes: For each unique value of the chosen feature, create a branch and a child node. These child nodes represent the possible values of the selected feature.

Splitting the Dataset: Split the dataset into subsets based on the values of the chosen feature. Each subset corresponds to a branch from the root node.

Recursive Splitting: For each child node, repeat the process of selecting the best feature to split on and creating child nodes. This step is applied recursively until a stopping condition is met (e.g., a predefined depth is reached, a minimum number of samples in a node, or all samples in a node belong to the same class).

Assigning Labels to Leaf Nodes: When a leaf node is reached, assign a class label or a target value based on the majority class or the average target value of the samples in that node.

Pruning (Optional): Pruning is a process of removing branches or nodes from the tree that do not provide significant value. It helps prevent overfitting and makes the tree more generalizable to new, unseen data.

Completing the Tree: Continue the process of recursive splitting until the tree is fully grown according to the stopping criteria.