this is inspired from https://docs.databricks.com/spark/latest/mllib/binary-classification-mllib-pipelines.html