Spark Recommender

Scalable recommendation system written in Scala using the Apache Spark framework.

Implemented algorithms include:

k-nearest neighbors
k-nearest neighbors with clustering
k-nearest neighbors with a cluster tree
Alternating Least Squares (ALS) from Spark's MLlib

This first version was created during the eClub Summer Camp 2014 at Czech Technical University.
See the results of a benchmark and documentation in reportAndDocumentation.pdf

Build

Spark Recommender is built with Simple Build Tool (SBT). Run command:

sbt assembly

It creates the jar file in directory target/scala-2.10/.

Run

The application can be run using the spark-submit script.

cd target/scala-2.10/

‘$SPARK_HOME‘/bin/spark-submit --master local --driver-memory 2G --executor-memory 6G SparkRecommender-assembly-0.1.jar --class Boot (+ parameters of the recommender)

here：

/opt/mapr/spark/spark-1.4.1/bin/spark-submit --class Boot --master local[*] --driver-memory 2G  --executor-memory 6G SparkRecommender-assembly-0.1.jar --data movieLens --dir /tmp --method kNN -p numberOfNehbors=5 --interface 0.0.0.0 --port 9527

See documentation of Spark for information about parameters of spark-submit.

Parameters of the recommender

Setting up API
- --interface <arg> Interface for setting up API (default = localhost)
- --port <arg> Port of interface for setting up API (default = 8080)
Setting the dataset
- --data <arg> Type of dataset
- --dir <arg> Directory containing files of dataset
Supported datasets: movieLens, netflix, netflixInManyFiles
Setting the algorithm
- --method <arg> Algorithm
- -pkey=value \[key=value\]... Parameters for algorithm
Provided algorithms: kNN, kMeansClusteredKnn, clusterTreeKnn, als
Other
- --products <arg> Maximal number of recommended products (default = 10)
- --help Shows help
- --version Shows version

See the documentation for parameters of a particular algorithm.

Example

‘$SPARK_HOME‘/bin/spark-submit --master local --driver-memory 2G \
--executor-memory 6G SparkRecommender-assembly-0.1.jar --class Boot\
--data movieLens --dir /mnt/share/movieLens/ \
--method kNN -p numberOfNeighbors=5

For simplification there's example-run script which sets some defaults. When running with netflix datasets it expects to have following files located in --dir:

ratings.txt
movie_titles.txt

./example-run --data netflix --dir /mnt/share/datasets/netflix \
 --method kNN -p numberOfNeighbors=5 --port 9090

API

Request

API supports two operations:

Recommend from user ID

  host:port/recommend/fromuserid/?id=<userID, Int>

Example:

  http://localhost:8080/recommend/fromuserid/?id=97

Recommend from ratings

   host:port/recommend/fromratings/?rating=<productID, Int>,<rating, Double>

Example:

   http://localhost:8080/recommend/fromratings/?rating=98,4&rating=176,5&rating=616,5

Response

The API returns the recommended products in form of JSON objects.

The JSON object for one recommendation looks like this:

{
    "product" : productID
    "rating" : Prediction of rating for this product
    "name" : "Name of product"
}

Example recommendation of three products:

{"recommendations":[
    {"product":312,"rating":5.0,"name":"High Fidelity (2000)"},
    {"product":494,"rating":5.0,"name":"Monty Python's The Meaning of Life: Special Edition (1983)"},
    {"product":516,"rating":4.0,"name":"Monsoon Wedding (2001)"}
]}

reasonpun/spark-recommender