Forest fires data set predictions using machine lerning methods

Project contains three methods boosted using GridSearch:

  • Support Vector Machine
  • Decision Tree Regressor
  • Random Forest with evaluated comparison of particular method errors.

Data source: UC Irvine Machine Learning Repository

Following project is the continuation of setting up Apache Spark cluster tutorial.

Jupyter notebook is set up on master node of Apache Spark two nodes cluster (Amazon Web Services). Data set is hold in AWS S3 bucket and downloaded directly from there.