
Binary Classification Models with pySpark in Apache Spark

Primary LanguageJupyter Notebook

Credit Card Anomaly Detection


Experiment with various binary classification models below and select the most appropriate based on Area Under the ROC Curve together with Principal Component Analysis (PCA) in Apache Spark.

  • Logistic Regression
  • RandomForest Classification
  • Linear Support Vector Classification
  • Gradient Boosted Tree Classification
  • Naive Bayes Classification


The following package to be installed:

pyspark                   2.4.5                      py_0 


Statlog (German Credit Data) Data Set


Machine Learning with PySpark (ISBN 978-1-4842-4130-1)