rasbt/python-machine-learning-book-2nd-edition

Chapter 3, different amount of misclassfied samples

villesar1 opened this issue · 1 comments

Hi,

chapter 3, First steps with scikit-learn – training a perceptron:

I get more misclassifications than in the book with the provided code (ch03.ipynb) :

y_pred = ppn.predict(X_test_std)
print('Misclassified samples: %d' % (y_test != y_pred).sum())

OUTPUT: Misclassified samples: 9

  • In the book the amount for misclassifications is 3.
  • Also, errata for page 55 pertains to this.


I haven't figured out why I get this deviation. Below is a picture of the decision regions where it can be seen that something is not right. ![03_01](https://user-images.githubusercontent.com/48717739/54609947-a0660d00-4a5c-11e9-9cf2-a44abfe92934.png)
rasbt commented

I just double-checked and you are right. This difference occurs in scikit-learn 0.20. If I use 0.19, I still get the 3 examples misclassified. In any case, for the perceptron, this is not important as it does not converge anyway when the classes are not linearly seperable, so you will get different results depending on the random seed or the number of iterations. I.e., it will cyclye through 2-9 misclassifications if you run it longer.

In other words, it's not an issue in your code, more like a shortcoming of the perceptron algorithm. If you are interested, I have some more details about the perceptron algorithms in my lecture slides: https://github.com/rasbt/stat479-deep-learning-ss19/blob/master/L03_perceptron/L03_perceptron_slides.pdf