Working on scikit-learn library in Python to classify - Anonymized credit card transactions labeled as fraudulent or genuine. The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. It contains only numerical input variables which are the result of a PCA transformation.
Here in this module I'm working on three classification models : Decision Tree Classifier K-nearest neighbors (KNN) Classifier Random Forest Classifer to predict Anonymized credit card transactions as fraudulent or genuine.
In order to do so I have done-- data Processing : Handled Missing values : Using mean Standarised the data : Using Standardisation
At the end, calculate the acuracy rate and error rate of all three classification.
Decision Tree Accuracy :
Accuracy_Decison : 99.87361325656508 -- (A better approach to follow)
Error_rate : 0.12638674343491083
Random Forest Accuracy :
Accuracy_Decison : 99.92978514253616
Error_rate : 0.07021485746383935
K-nearest neighbors (KNN) Accuracy :
Accuracy_Decison : 99.9578710855217
Error_rate : 0.04212891447830361
Thanks for looking it :) .... Feel free to like :)