/SparkCredit

Primary LanguageScala

SparkCredit

SparkCredit trains a creditability prediction model with SVM and predict creditability of incoming new credit application received from Kafka stream

Build

To build this app:

   git clone https://github.com/maochf/SparkCredit.git
   mvn clean package

The build will create a spark-credit-1.0-SNAPSHOT-jar-with-dependencies.jar file under the target directory. This can be used to deploy your app to Spark.

Run

First Spark needs to be downloaded and set the $SPARK_HOME to the Spark root directory

> export SPARK_HOME=<spark_root_directory>

You can train the SVM model by running the command under SparkCredit project directory:

> $SPARK_HOME/bin/spark-submit --class "ml.Credit" --master local[2] target/spark-credit-1.0-SNAPSHOT.jar

The above command will save the model under SparkCredit/model/svm_model directory.

After the model is trained, Kafka streaming requires Zookeeper and Kafka server running. The following commands are examples of starting local Zookeeper and Kafka server under Kafka root directory:

> bin/zookeeper-server-start.sh config/zookeeper.properties
> bin/kafka-server-start.sh config/server.properties

Then run kafka.CreditKafkaProducer from IDE to start Kafka producer to read from data/stream.csv and send new credit application to the topic "credit".

Run the following command to consume credit application from Kafka stream at topic "credit" and make creditability prediction with model/svm_model :

> $SPARK_HOME/bin/spark-submit --class "kafka.CreditPredictConsumer" --master local[2] \
target/spark-credit-1.0-SNAPSHOT-jar-with-dependencies.jar model/svm_model credit