This is the group project of COMP7404B. Original dataset can be found at Kaggle. It contains data about credit card transactions that occurred during a period of two days, with 492 frauds out of 284, 807 transactions. All variables in the dataset are numerical. The data has been transformed using PCA transformation(s) due to privacy reasons.
We use Autoencoder Neural Network + SVM for anomaly detection in credit card transaction data and evaluate its performences with only using logistic regression and svm.
.
├── autoencoder.ipynb
├── constrast.ipynb
├── dataset
│ └── creditcard.csv
├── docs
│ ├── autoencoder.html
│ ├── constrast.html
│ ├── Data analysis for credit card fraud detection.pptx
│ └── fraud_detection.html
├── fraud_detection.ipynb
├── model.h5
├── original.png
└── README.md
fraud_detection.ipynb
: is the main file with Autoencoder + SVMautoencoder.ipynb
: try to use Antoencoder to do classificationconstrast.ipynb
: logitsic regression and SVM
To view the result, just download docs/
folder is enough.
Method | Precision | Recall | F-1 Score |
---|---|---|---|
AE + SVM | 0.84 | 0.75 | 0.79 |
Logistic Regression | 0.85 | 0.61 | 0.71 |
SVM | 0 | 0 | NA |
-
SVM
does not work for this imbalanced dataset. -
AE + SVM
has nearly the same precision withlogistic regression
. But the recall ofAE + SVM
is higher, thus results in a better F-1 score. -
AE + SVM
performs best among 3 methods.
Autoencoder is able to effectively extract sample features and can be applied to dealing with imbalanced dataset.