ArntheGitHub/Fraud-Detection-in-Retail-W6
The aim of the analysis is to use the data set of 300,000 cases (W06_training.txt) to train a model that is suitable for detecting fraud attempts. The prediction of the model is finally checked with the help of another data set with 100,000 purchases for which we do not know the target variable. This data set (W06_scoring.txt) is used to evaluate how well the model’s prediction works, using the total cost or total revenue. This means that we must ensure that there is no overfitting when training the model, otherwise the prediction on the new data set will give poor results. To do this, we should split the data set into training and test data or use a suitable cross-validation method to avoid overfitting.
Jupyter Notebook