Machine Learning Final Project
This project aims to develop a predictive model to predict the academic performance of mining students using student data. We analyze a dataset of student records from a high school in the Portugal. To conduct this research, supervised and numeric data were collected from Kaggle having 395 instances, 28 attributes and 1 class attribute. The dataset has no missing values. The final dataset was generated by eliminating outliers and scaling the data.
We used Naïve bayes, KNN, SVM, Kernel SVM, ANN, Logistic Regression, Decision tree and Random Forest classifiers to build different models. The training and test set ratio was 75:25. Among them, Decision tree and Random Forest algorithm performed best with 100% accuracy on test dataset. Lastly, ROC curve and AUC are also compared among applied algorithms. A dataset with large number of instances would be compatible to analyze the performance metrices more accurately.
Fig: Flowchart of the solution
Fig: The preprocessed student related variables
Fig: Individual histogram of the attributes
Fig: Histogram of the attributes altogether
Fig: Bar plot of class attribute
Fig: Confusion matrix for Decision Tree and Random Forest classifier