Performing Support Vector Classification (SVC) on the iris dataset, which comprises three classes of flower types: 'setosa', 'versicolor', and 'virginica', each containing 50 samples. The dataset consists of four features: Sepal Length, Sepal Width, Petal Length, and Petal Width.
The dataset has been divided into a training set and a test set.
The linear kernel should be employed when the datapoints are linearly separable, given its lower number of parameters to learn, resulting in faster computation, making it particularly well-suited for high-dimensional datasets.
Result | Sepal Length & Sepal Width | Petal Length & Petal Width |
---|---|---|
Decision Regions | ||
Confusion Matrix | ||
Classification Report |
Based on the results obtained using the petal length & petal width features, where datapoints exhibit linear separability, the classification performance is notably impressive.
-
RBF Kernel: RBF kernel is the most generalized form of kernelization and is one of the most widely used kernels due to its similarity to the Gaussian distribution. Read More
The RBF kernel function for two points
$X_1$ and$X_2$ computes the similarity or how close they are to each other. This kernel can be mathematically represented as follows:$$K(X_1, X_2) = \exp\left(-\frac{|X_1 - X_2|^2}{2\sigma^2}\right)$$ -
Polynomial Kernel: In general, the polynomial kernel is defined as:
Result | Kernel = Poly | Kernel = RBF |
---|---|---|
Confusion Matrix | ||
Classification Report |
The optimization function for Soft SVM is written as follows:
subject to:
C is a hyperparameter which determines the trade-off between lower error or higher margin.
To determine the optimal 'C' value for each kernel and the best 'gamma' value for the RBF kernel, GridSearchCV is utilized as shown below.
from sklearn.model_selection import GridSearchCV
param = {'C': [0.1, 1, 10, 100], 'gamma': [0.1, 0.01, 0.001, 0.0001]}
kernels = ['linear', 'poly', 'rbf']
best_param={}
for kernel in kernels:
model = SVC(kernel=kernel)
grid_search = GridSearchCV(model, param_grid=param)
grid_search.fit(X_train, y_train)
best_param[kernel] = grid_search.best_params_
print("Optimal hyperparameters for Kernel = "+str(kernel)+ ":"+str(best_param[kernel]))
print("Accuracy on test set for Kernel = "+str(kernel)+ ":" +str(grid_search.score(X_test, y_test)*100)+"%\n")
Optimal hyperparameters for Kernel = linear:{'C': 1, 'gamma': 0.1}
Optimal hyperparameters for Kernel = poly:{'C': 0.1, 'gamma': 0.1}
Optimal hyperparameters for Kernel = rbf:{'C': 100, 'gamma': 0.01}
Since the dataset is multi-class, two methods, namely One Vs. Rest and One Vs. One, are employed to classify the data. The "decision_function_shape" parameter is set accordingly. Here are the classification results obtained for each method.
Result | Kernel = Linear | Kernel = Poly | Kernel = RBF |
---|---|---|---|
Confusion Matrix | |||
Classification Report |
Result | Kernel = Linear | Kernel = Poly | Kernel = RBF |
---|---|---|---|
Confusion Matrix | |||
Classification Report |
In this particular problem, the One Vs. One method exhibits slightly better classification performance.