Code for Data Mining Project March 2020
Dataset: Mice Protein Expression Data Set from https://archive.ics.uci.edu/ml/datasets/Mice+Protein+Expression Aim to analyse the dataset and try to identify interesting features using various R packages and also:
- Dealing with missing values and categorical variables.
- Building various models with 10 fold cross validation (Random Forest, SVM, KNN, Neural networks and Naives bayes).
- Building another model after performing PCA.
- Studying the behaviour of the model and final analysis on performance of the models with high dimensional dataset.