/sampling

Primary LanguageJupyter Notebook

Sampling

In this python code, we have used sampling techniques on a credit card fraud detection dataset and then applied various ML models on the dataset to find out which Model,sampling technique combination gives us the best accuracy.

Following 5 sampling Techniques were used:

1.Random Under Sampler
2.Random Over Sampler
3.TOMEK links
4.Convenience sampling
5.Simple Random Sampling

The Sample size was calculates using the following formula: n = Z^2(p(1 – p)/m^2) where: n = sample size Z = z-value (for 95% confidence interval, Z = 1.96) p = proportion of the minority class (taken as 0.5 for a balanced dataset) m = margin of error (taken as 0.1 for a sample size of 1000)

Following 5 models were applied on the sampled dataset:

1.Logistic Regression (M1)
2.Decision Tree Classifier (M2)
3.Random Forest Classifier (M3)
4.KNeighborsClassifier(M4)
5.GaussianNB (M5)

On execution of the code, following results were obtained: The cells of the table represent accuracy of the applied model using the respective sampling technique.

image

As we can see from the table the Maximum Accuracy obtained is of 99.35%.