This project is a machine learning application using Apache Spark ML.

Please first download and install Apache Spark by following the instructions on this website

After the installation, you need to update the path in "util.py" and "" by first searching the keyword "spark_submit_location" and replace that line with this:

spark_submit_location = '/home/rwu/Desktop/spark-2.3.0-bin-hadoop2.7/bin/spark-submit'

Here, "/home/rwu/Desktop/spark-2.3.0-bin-hadoop2.7/bin/spark-submit" should be the spark-submit path.

The sample data is stored in the folder "data". "prms_input.csv" is for Case Study 1. "1.csv" and "2.csv" are for Case Study 2. These files are generated by the hydrologic models for the streamflow predictions. If you want to use our prototype to improve your model accuracy. Please place all your data into a csv file and the first column should be observations (true values) and the second column should be your model predictions (check "prms_input.csv" format).

#Quick Start First create a virtual environment

mkvirtualenv -p python2.7 dev

If you have created the virtual environment, then use this commend to enter it

virtualenv dev && source dev/bin/activate

Here is the command to install the requirements

pip install -r requirements.txt

Here is the command to run the program

python test_TD_machine_learning_recursive_training_bound_transform.py

Here is the command to visualize the results

python vis_rmse_all_results.py