/AdaBoost

Implementation of AdaBoost in Python

Primary LanguagePython

AdaBoost Implementation

Implementation of AdaBoost with "optimal" decision stumps on the training data. After each round, the following gets computed:

  1. Current training error for the weighted linear combination predictor at this round
  2. Current testing error for the weighted linear combination predictor at this round
  3. Current test AUC for the weighted linear combination predictor at this round
  4. The local "round" error for the decision stump returned

Supported python versions

Python 3

Documentation

Load the spambase dataset and split into train and test

from Datasets import spambase
filename = "data/Spambase dataset/spambase.data"
train_X, train_y, test_X, test_y = spambase(filename)

Setup model (following parameters are default)

from AdaBoost import AdaBoost
model = AdaBoost(iterations = 100)

Train model

model.fit(train_X, train_y, test_X, test_y)

Plot of train and test error versus number of iterations

model.plot_train_test_error()

Plot of final ROC curve

model.plot_ROC_curve()

Plot of local round error which reduces after each iteration

model.plot_round_error()

Results

Train error, Test error and Test AUC after every 25 iterations:

  • Round 0 : Train_err: 0.20760869565217388 Test_err: 0.21064060803474483 AUC: 0.748974795114
  • Round 25 : Train_err: 0.06766304347826091 Test_err: 0.07600434310532034 AUC: 0.978207515077
  • Round 50 : Train_err: 0.060054347826086985 Test_err: 0.07057546145494031 AUC: 0.982347610948
  • Round 75 : Train_err: 0.056793478260869557 Test_err: 0.06297502714440828 AUC: 0.984188340807
  • Round 100 : Train_err: 0.05461956521739131 Test_err: 0.061889250814332275 AUC: 0.985246946034

Graph of Train/Test Error, ROC Curve, Round Error vs number of iterations:

title

title

title