In the previous lab, you learned how to build an SVM from scratch. Here, you'll learn how to use scikit-learn to create SVMs!
In this lab you will:
- Use scikit-learn to build an SVM when there are two groups
- Use scikit-learn to build an SVM when there are more than two groups
To start, here's some code using the scikit-learn dataset generator again:
- The first dataset contains the same blobs as the first SVM in the last lab
- The second dataset contains the same blobs as the second SVM (Soft Margin classifier) in the last lab
- The third dataset contains four separate blobs
- The fourth dataset contains slightly different data with two classes, but using the
make_moons()
generator instead of blobs
from sklearn.datasets import make_blobs
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
plt.figure(figsize=(10, 10))
plt.subplot(221)
plt.title('Two Seperable Blobs')
X_1, y_1 = make_blobs(n_features=2, centers=2, cluster_std=1.25, random_state=123)
plt.scatter(X_1[:, 0], X_1[:, 1], c = y_1, s=25)
plt.subplot(222)
plt.title('Two Blobs with Mild Overlap')
X_2, y_2 = make_blobs(n_samples=100, n_features=2, centers=2, cluster_std=3, random_state=123)
plt.scatter(X_2[:, 0], X_2[:, 1], c = y_2, s=25)
plt.subplot(223)
plt.title('Four blobs with Varying Separability')
X_3, y_3 = make_blobs(n_samples=100, n_features=2, centers=4, cluster_std=1.6, random_state=123)
plt.scatter(X_3[:, 0], X_3[:, 1], c=y_3, s=25)
plt.subplot(224)
plt.title('Two Moons with Substantial Overlap')
X_4, y_4 = make_moons(n_samples=100, shuffle=False , noise=0.3, random_state=123)
plt.scatter(X_4[:, 0], X_4[:, 1], c=y_4, s=25)
plt.show()
Let's have a look at our first plot again:
X_1, y_1 = make_blobs(n_features=2, centers=2, cluster_std=1.25, random_state=123)
plt.scatter(X_1[:, 0], X_1[:, 1], c=y_1, s=25);
Now it's time to fit a simple linear support vector machine model on this data. The process is very similar to other scikit-learn models you have built so far: import the class, instantiate, fit, and predict.
- Import
SVC
from scikit-learn'ssvm
module - Instantiate
SVC
(which stands for Support Vector Classification) withkernel='linear'
as the only argument - Call the
.fit()
method with the data as the first argument and the labels as the second.
Note: Typically you should scale data when fitting an SVM model. This is because if some variables have a larger scale than others, they will dominate variables that are on a smaller scale. To read more about this, check out page 3 of this paper. Because these variables are all on a similar scale, we will not apply standardization. However, when performing SVM on a real-world dataset, you should ALWAYS scale the data before fitting a model.
# Your code here
Print the coefficients of the model:
# Print the coefficients
Save the first feature (on the horizontal axis) as X_11
and the second feature (on the vertical axis) as X_12
.
X_11 = None
X_12 = None
When we create plots for the classifier later, we're going to want appropriate scales for the axes. In order to do this, we should see what the minimum and maximum values are for the horizontal and vertical axes. To make the plots not feel cramped, we will subtract the minimum by 1 and add 1 to the maximum. Save these values as X11_min
, X11_max
, X12_min
, and X12_max
.
# Limits for the axes
X11_min, X11_max = None
X12_min, X12_max = None
Next, use NumPy's linspace()
function to generate evenly spaced points between these adjusted min and max values for both X_11
and X_12
. Generating 10 points along each is sufficient.
# Your code here
x11_coord = None
x12_coord = None
Now you'll create an entire grid of points by combining these two arrays using NumPy's meshgrid()
function. It's a straightforward function, but feel free to pull up the documentation if you haven't worked with it before.
# Your code here
X12_C, X11_C = None
Finally, you need to reshape the outputs from meshgrid()
to create a numpy array of the shape (100, 2) that concatenates the coordinates for X11
and X12
together in one numpy object. Use np.c_
and make sure to use .ravel()
first. Use np.shape()
on your resulting object first to verify the resulting shape.
# Your code here
x11x12 = None
# Check the shape
Great! Now we want to get a decision boundary for this particular dataset. Call clf.decision_function()
on x11x12
. It will return the distance to the samples that you generated using np.meshgrid()
. You need to then change the result's shape in a way that you get a (10,10) numpy array. We need to reshape this numpy array because it must be a 2-dimensional shape to function with the contour()
method you will use in the next plot.
# Your code here
Now, let's plot our data again with the result of SVM in it. This is what the following code does, all you need to do is run it.
- The first line is simply creating the scatter plot like before
- Next, we specify that what we'll do next uses the same axes as the scatter plot. We can do this using
plt.gca()
. Store it in an object and we'll use this object to create the lines in the plot - Then we use
.countour()
. The first two arguments are the coordinates created using the meshgrid, the third argument is the result of your decision function call - We'll want three lines: one decision boundary line and the two lines going through the support vectors. We include
levels = [-1,0,1]
to get all three
plt.scatter(X_11, X_12, c=y_1)
axes = plt.gca()
axes.contour(X11_C, X12_C, df1, colors=['blue', 'black', 'blue'],
levels=[-1, 0, 1], linestyles=[':', '-', ':'])
plt.show()
The coordinates of the support vectors can be found in the .support_vectors_
attribute. Have a look:
# Your code here
Now we can recreate your plot and highlight the support vectors:
plt.scatter(X_11, X_12, c=y_1)
axes = plt.gca()
axes.contour(X11_C, X12_C, df1, colors='black', levels=[-1, 0, 1], linestyles=[':', '-', ':'])
axes.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], facecolors='blue')
plt.show()
The previous example was pretty easy. The two "clusters" were easily separable by one straight line classifying every single instance correctly. But what if this isn't the case? Let's look at the second dataset again:
X_2, y_2 = make_blobs(n_samples=100, n_features=2, centers=2, cluster_std=3, random_state=123)
plt.scatter(X_2[:, 0], X_2[:, 1], c=y_2, s=25);
Scikit-learn's SVC
class you used above automatically allows for slack variables. As such, simply repeat the code to fit the SVM model and plot the decision boundary.
# Your code here
As you can see, three instances are misclassified (1 yellow, 2 purple). It may not be possible to improve this, but it's worth experimenting with by changing your hyperparameter C
, which can be done when initializing the classifier. Try it out now; re-instantiate a model object, adding a high value for the argument C
. Specifically, set C = 5,000,000. Then refit the classifier and draw the updated decision boundary.
# Your code here
If you dig deeper into scikit-learn, you'll notice that there are several ways to create linear SVMs for classification:
SVC(kernel = "linear")
, what you've used above. Documentation can be found heresvm.LinearSVC()
, which is very similar to the simple SVC method, but:- Does not allow for the keyword "kernel", as it is assumed to be linear (more on non-linear kernels later)
- In the objective function,
LinearSVC
minimizes the squared hinge loss whileSVC
minimizes the regular hinge loss LinearSVC
uses the one-vs-all (also known as one-vs-rest) multiclass reduction whileSVC
uses the one-vs-one multiclass reduction (this is important only when having > 2 classes!)
svm.NuSVC()
, which is again very similar,- Does have a "kernel" argument
SVC
andNuSVC
are essentially the same thing, except that forNuSVC
,C
is reparametrized intonu
. The advantage of this is that whereC
has no bounds and can be any positive number,nu
always lies between 0 and 1- One-vs-one multiclass approach
So what does one-vs-one mean? What does one-vs-all mean?
- One-vs-one means that with
$n$ classes,$\dfrac{(n)*(n-1)}{2}$ boundaries are constructed! - One-vs-all means that when there are
$n$ classes,$n$ boundaries are created.
The difference between these three types of classifiers is mostly small but generally visible for datasets with 3+ classes. Have a look at our third example and see how the results differ!
plt.figure(figsize=(7, 6))
plt.title('Four blobs')
X, y = make_blobs(n_samples=100, n_features=2, centers=4, cluster_std=1.6, random_state = 123)
plt.scatter(X[:, 0], X[:, 1], c = y, s=25);
Try four different models and plot the results using subplots where: - The first one is a regular SVC (C=1) - The second one is a regular SVC with C=0.1 - The third one is a NuSVC with nu= 0.7 - The fourth one is a LinearSVC (no arguments)
Make sure all these plots have highlighted support vectors, except for LinearSVC (this algorithm doesn't have the attribute .support_vectors_
).
Additionally, be sure to use contourf()
, instead of contour()
to get filled contour plots.
# Your code here
from sklearn import svm
Now, look at the coefficients of the decision boundaries. Remember that a simple SVC
uses a one-vs-one method, this means that for four classes, .coef_
. Compare these with the coefficients for the LinearSVC. What do you notice?
# Your code here
plt.title('Two interleaving half circles')
X_4, y_4 = make_moons(n_samples=100, shuffle = False , noise = 0.3, random_state=123)
plt.scatter(X_4[:, 0], X_4[:, 1], c = y_4, s=25)
plt.show()
Finally, look at the fourth plot. While you can try and draw a line to separate the classes, it's fairly apparent that a linear boundary is not appropriate. In the next section, you'll learn about SVMs with non-linear boundaries!
It is highly recommended to read up on SVMs in the scikit-learn documentation!
- https://scikit-learn.org/stable/modules/svm.html
- https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
- https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html
- https://scikit-learn.org/stable/modules/generated/sklearn.svm.NuSVC.html
In this lesson, you explored and practiced how to use scikit-learn to build linear support vector machines. In the next lesson, you'll learn how SVMs can be extended to have non-linear boundaries.