ID3 Decision Tree Algorithm
ID3 is a Machine Learning Decision Tree Classification Algorithm that uses two methods to build the model. The two methods are Information Gain and Gini Index.
- Version 1.0.0 - Information Gain Only
- Version 2.0.0 - Gini Index added
- Version 2.0.1 - Documentation Sorted
- Version 2.0.2 - All Sorted
Installation
Install directly from my PyPi
pip install classic-ID3-DecisionTree
Or Clone the Repository and install
python3 setup.py install
Parameters
* X_train
The Training Set array consisting of Features.
* y_train
The Training Set array consisting of Outcome.
* dataset
The Entire DataSet.
Attributes
* DecisionTreeClassifier()
Initialise the instance of Decision Tree Classifier class.
* add_features(dataset, result_col_name)
Add the features to the model by sending the dataset. The model will fetch the column features. The second parameter is the column name of outcome array.
* information_gain(X_train, y_train)
To build the decision tree using Information Gain
* gini_index(X_train, y_train)
To build the decision tree using Gini Index
* predict(y_test)
Predict the Test Set Results
Documentation
1. Install the package
pip install classic-ID3-DecisionTree
2. Import the library
from classic_ID3_decision_tree import DecisionTreeClassifier
3. Create an object for Decision Tree Classifier class
id3 = DecisionTreeClassifier()
4. Add Column Features to the model
id3.add_features(dataset, result_col_name)
5. Build the Decision Tree Model using Information Gain
id3.information_gain(X_train, y_train)
OR
5. Build the Decision Tree Model using Gini Index
id3.gini_index(X_train, y_train)
6. Predict the Test Set Results
y_pred = id3.predict(X_test)
Example Code
0. Download the dataset
Download dataset from here
1. Import the dataset and Preprocess
- import numpy as np
- import matplotlib.pyplot as plt
- import pandas as pd
- dataset = pd.read_csv('house-votes-84.csv')
- rawdataset = pd.read_csv('house-votes-84.csv')
- party = {'republican':0, 'democrat':1}
- vote = {'y':1, 'n':0, '?':0}
- for col in dataset.columns:
- if col != 'party':
- dataset[col] = dataset[col].map(vote)
- dataset['party'] = dataset['party'].map(party)
- X = dataset.iloc[:, 1:17].values
- y = dataset.iloc[:, 0].values
- from sklearn.model_selection import KFold
- kf = KFold(n_splits=5)
- for train_index, test_index in kf.split(X,y):
- X_train, X_test = X[train_index], X[test_index]
- y_train, y_test = y[train_index], y[test_index]
2. Use the ID3 Library
- from classic_ID3_decision_tree import DecisionTreeClassifier
- id3 = DecisionTreeClassifier()
- id3.add_features(dataset, 'party')
- print(id3.features)
- id3.information_gain(X_train, y_train)
- OR
- id3.gini_index(X_train, y_train)
- y_pred = id3.predict(X_test)
Footnotes
You can find the code at my Github.