/Machine-Learning-Individual-Project

BEMACS Individual Assignment - Machine Learning course

Primary LanguageJupyter Notebook

Machine-Learning-Individual-Project

BEMACS Individual Assignment - Machine Learning course

This dataset is composed of 1100 samples with 20 features each. The first column is the sample id. The second column in the dataset represents the label. There are 5 possible values for the labels. The remaining columns are numeric features.

Your task is the following: you should compare the performance of the k-NearestNeighbors algorithm (implemented by sklearn.neighbors.KNeighborsClassifier) with that of a Random Forest (implemented by sklearn.ensemble.RandomForestClassifier). Try to optimize both algorithms' parameters and determine which one is best for this dataset. At the end of the analysis, you should have chosen an algorithm and its optimal set of parameters: write this choice explicitly in the conclusions of your notebook.

Your notebook should detail the procedure you have used to choose the optimal parameters (graphs are a good idea when possible/sensible).

The notebook will be evaluated not only based on the final results, but also on the procedure employed, which should balance practical considerations (one may not be able to exhaustively explore all possible combinations of the parameters) with the desire for achieving the best possible performance in the least amount of time.