Mining-Student-Data-to-Predict-Result

Machine Learning Final Project

This project aims to develop a predictive model to predict the academic performance of mining students using student data. We analyze a dataset of student records from a high school in the Portugal. To conduct this research, supervised and numeric data were collected from Kaggle having 395 instances, 28 attributes and 1 class attribute. The dataset has no missing values. The final dataset was generated by eliminating outliers and scaling the data.

We used Naïve bayes, KNN, SVM, Kernel SVM, ANN, Logistic Regression, Decision tree and Random Forest classifiers to build different models. The training and test set ratio was 75:25. Among them, Decision tree and Random Forest algorithm performed best with 100% accuracy on test dataset. Lastly, ROC curve and AUC are also compared among applied algorithms. A dataset with large number of instances would be compatible to analyze the performance metrices more accurately.

Fig: Flowchart of the solution

Fig: The preprocessed student related variables

Fig: Individual histogram of the attributes

Fig: Histogram of the attributes altogether

Fig: Bar plot of class attribute

Fig: Correlation Heatmap

Fig: Confusion matrix for Decision Tree and Random Forest classifier

Fig: ROC Curve and AUC of applied classifiers

shakib-sadat/Mining-Student-Data-to-Predict-Result

Mining-Student-Data-to-Predict-Result