This short lesson summarizes key takeaways from section 33.
You will be able to:
- Understand and explain what was covered in this section
- Understand and explain why this section will help you become a data scientist
The key takeaways from this section include:
- Support Vector Machines can be used for regression tasks, although they're better known as a powerful algorithm for classification.
- SVMs optimize for maximizing the margin between the decision boundary and the nearest data points (the support vectors)
- For data that isn't linearly seperable (no single straight line will correctly classify all of the observations into the correct categpries) a soft margin classifier can be used to allow for a model that may mis-classify some of the training data points
- You can improve the performance of your SVMs for data sets where a linear decision boundary isn't very useful by using the kernel trick
- With the kernel trick, you project your data set into a higher dimensional space
- The Gaussian/Radial Basis Function (RBF) kernel provides you with tro hyper parameters - C and Gamma
- C allows you to trade off between misclassification of the training set and simplicity of the decision function (to avoid overfitting). It's common to all kernels
- Gamma allows you to define how much influence a single training example has
- The polynomial kernel is specified as
$$(\gamma \langle x - x' \rangle+r)^d $$ - The sigmoid kernel is specified as
$$\tanh ( \gamma\langle x - x' \rangle+r) $$ - Based on the "no free lunch" theorum, there's no kernel that is guaranteed to be better for a given training set than others, but the RBF kernel is often a good default to start with
- in Scikit-learn, the SVC method is one classifier supporting multiple kernels. It takes C as the penalty parameter and additional kernel specific parameters such as degree (for polynomail) and gamma (for RBF, polynomial and sigmoid) https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
- There are other implementations of SVM built into Scikit-learn. For example:
- NuSVC introduces
$\nu$ (pronounced "nu") creates an upper bound on training errors and a lower bound on support vectors - LinearSVC is a "one-vs-rest" method which often scales better for cases with many potential classes
- NuSVC introduces