Goal:

analyze the dataset and come up with a model that will best detect fraudalent transactions
compare different popular models and determine which ones perform better
explore machine learning validation metrics to determine quality of each model (AUPRC vs AUC-ROC)
with the highly imbalanced data set, try different data analysis techiniques (oversampling/undersampling)
go back and use some oversampling method to see how the models change with the availability of more data
dig into each method's parameters and tune it to see if we can get better results. Determine whether or not the dataset was optimized for this kind of problem

Notes:

Dataset was financial transactions dataset from Kaggle
Reduced Featureset: 28 Features determined from prior PCA analysis. Original features were scrubbed for user anonymity
Time & Amount are the only two original features
Total Samples in DataSet: 284,807. Number of Fraudalent transactions: 492 (0.172%) of all transactions. Represented by "Class" Feature

get XGBoost working
create baseline model comparison dataframe with confusion matrix results of all the models
do AUPRC vs AUC-ROC comparison / analysis
do randomundersampler (but better version)
do oversampling method for 5k, 10k, 100k, equal parity
re-run the same algos, see how they change over time with more data, or as the data changes
visualize the efficacy of each model with more time