Machine Learning and Artificial Intelligence - Homework #2 - A.Y. 2018/2019 - Politecnico di Torino
- Load Iris dataset
- Simply select the first two dimensions (let’s skip PCA this time)
- Randomly split data into train, validation and test sets in proportion 5:2:3
- For C from 10^(-3) to 10^3: (multiplying at each step by 10)
- Train a linear SVM on the training set.
- Plot the data and the decision boundaries
- Evaluate the method on the validation set
- Plot a graph showing how the accuracy on the validation set varies when changing C
- How do the boundaries change? Why?
- Use the best value of C and evaluate the model on the test set. How well does it go?
- Repeat point 4. (train, plot, etc..), but this time use an RBF kernel
- Evaluate the best C on the test set.
- Are there any differences compared to the linear kernel? How are the boundaries different?
- Perform a grid search of the best parameters for an RBF kernel: we will now tune both gamma and C at the same time. Select an appropriate range for both parameters. Train the model and score it on the validation set.
- Show the table showing how these parameters score on the validation set.
- Evaluate the best parameters on the test set. Plot the decision boundaries.
- Merge the training and validation split. You should now have 70% training and 30% test data.
- Repeat the grid search for gamma and C but this time perform 5-fold validation.
- Evaluate the parameters on the test set. Is the final score different? Why?