This program create a model to predict if the user clicks or not from a dataset of advertising data. The model is already saved you have just to give your data and the trained model will be apply on it. In input the program takes a JSON file. In output it gives a CSV with the first column which contains the predicted values.
To launch the program follow the instructions:
- Clone the current git repository.
- In the terminal, move to the root folder of the cloned project.
- Do the command
sbt assembly
to create the JAR file. - Execute the command
mv target/scala2.12/adprediction-assembly-1.0.jar adprediction.jar
- Launch with
java -jar adPrediction.jar [param]
param is run or predict. run is for re train the model and predict is for predict the label of a data file.
To train a new model, first make sure a file called data-students.json
exists in the project's directory, and that it contains a label column.
Please make sure that if a folder named models
exists, you delete it beforehand.
Then, train a new model by calling java -jar adprediction.jar train
. It should take a few minutes, and the results are in models/LogisticRegression
. The metrics for the trained model are shown during the process.
The prediction requires a model to be created beforehand; the folder models/LogisticRegression
must exist and it must not be empty.
If you have an output
folder, please delete it before running the prediction.
To predict the outcome of input values, run java -jar adprediction.jar predict [filename]
, where filename is the path to a JSON file.
Please note that if your data contains a label attribute, it will be replaced during the process by predicted values.
The results are stored in a folder called output
. Inside, you will find some files, namely one CSV containing the results. The CSV's name changes because of Spark implementation of workers, but it should always follow the scheme part-0000-xxxx.csv
.
The predicted label is stored in the first column, called label, and the value varies from true to false.