Heart Disease Prediction Using Spark 2.0 With Java

Source code: Use three Java files that predict heart disease for Cleveland dataset. There are three implementations of the algorithm:

  1. Logistic regression bases: it doesn't work good because the dataset is categorical
  2. Naive Bayes: It can predict the chances of heart disease. However, the prediction accuracy is not that good.
  3. Random Forest: Works pretty well. And you should find out why?

You can refer my book "Large Scale Machine Learning with Spark" at https://www.packtpub.com/big-data-and-business-intelligence/large-scale-machine-learning-spark

Finally, you should reuse the attached Maven friendly pom.xml file for your project setup. You can also change Spark or other dependency versions if you want.

Dataset: Use the dataset named as "processed_cleveland.data".