Mining-Student-Data-to-Predict-Result

Machine Learning Final Project

This project aims to develop a predictive model to predict the academic performance of mining students using student data. We analyze a dataset of student records from a high school in the Portugal. To conduct this research, supervised and numeric data were collected from Kaggle having 395 instances, 28 attributes and 1 class attribute. The dataset has no missing values. The final dataset was generated by eliminating outliers and scaling the data.

We used Naïve bayes, KNN, SVM, Kernel SVM, ANN, Logistic Regression, Decision tree and Random Forest classifiers to build different models. The training and test set ratio was 75:25. Among them, Decision tree and Random Forest algorithm performed best with 100% accuracy on test dataset. Lastly, ROC curve and AUC are also compared among applied algorithms. A dataset with large number of instances would be compatible to analyze the performance metrices more accurately.

image Fig: Flowchart of the solution

image Fig: The preprocessed student related variables

image Fig: Individual histogram of the attributes

image Fig: Histogram of the attributes altogether

image Fig: Bar plot of class attribute

image Fig: Correlation Heatmap

image Fig: Confusion matrix for Decision Tree and Random Forest classifier

image Fig: ROC Curve and AUC of applied classifiers