/spark-recommender

Scalable recommendation system written in Scala using the Apache Spark framework

Primary LanguageScalaMIT LicenseMIT

Spark Recommender

Scalable recommendation system written in Scala using the Apache Spark framework.

Implemented algorithms include:

  • k-nearest neighbors
  • k-nearest neighbors with clustering
  • k-nearest neighbors with a cluster tree
  • Alternating Least Squares (ALS) from Spark's MLlib

This first version was created during the eClub Summer Camp 2014 at Czech Technical University.
See the results of a benchmark and documentation in reportAndDocumentation.pdf

Build

Spark Recommender is built with Simple Build Tool (SBT). Run command:

sbt assembly

It creates the jar file in directory target/scala-2.10/.

Run

The application can be run using the spark-submit script.

cd target/scala-2.10/

‘$SPARK_HOME‘/bin/spark-submit --master local --driver-memory 2G --executor-memory 6G SparkRecommender-assembly-0.1.jar --class Boot (+ parameters of the recommender)

here:

/opt/mapr/spark/spark-1.4.1/bin/spark-submit --class Boot --master local[*] --driver-memory 2G  --executor-memory 6G SparkRecommender-assembly-0.1.jar --data movieLens --dir /tmp --method kNN -p numberOfNehbors=5 --interface 0.0.0.0 --port 9527

See documentation of Spark for information about parameters of spark-submit.

Parameters of the recommender

  • Setting up API

    • --interface <arg> Interface for setting up API (default = localhost)
    • --port <arg> Port of interface for setting up API (default = 8080)
  • Setting the dataset

    • --data <arg> Type of dataset
    • --dir <arg> Directory containing files of dataset

    Supported datasets: movieLens, netflix, netflixInManyFiles

  • Setting the algorithm

    • --method <arg> Algorithm
    • -pkey=value \[key=value\]... Parameters for algorithm

    Provided algorithms: kNN, kMeansClusteredKnn, clusterTreeKnn, als

  • Other

    • --products <arg> Maximal number of recommended products (default = 10)
    • --help Shows help
    • --version Shows version

See the documentation for parameters of a particular algorithm.

Example

‘$SPARK_HOME‘/bin/spark-submit --master local --driver-memory 2G \
--executor-memory 6G SparkRecommender-assembly-0.1.jar --class Boot\
--data movieLens --dir /mnt/share/movieLens/ \
--method kNN -p numberOfNeighbors=5

For simplification there's example-run script which sets some defaults. When running with netflix datasets it expects to have following files located in --dir:

  • ratings.txt
  • movie_titles.txt
./example-run --data netflix --dir /mnt/share/datasets/netflix \
 --method kNN -p numberOfNeighbors=5 --port 9090

API

Request

API supports two operations:

  • Recommend from user ID

      host:port/recommend/fromuserid/?id=<userID, Int>
    

    Example:

      http://localhost:8080/recommend/fromuserid/?id=97
    
  • Recommend from ratings

       host:port/recommend/fromratings/?rating=<productID, Int>,<rating, Double>
    

    Example:

       http://localhost:8080/recommend/fromratings/?rating=98,4&rating=176,5&rating=616,5
    

Response

The API returns the recommended products in form of JSON objects.

The JSON object for one recommendation looks like this:

{
    "product" : productID
    "rating" : Prediction of rating for this product
    "name" : "Name of product"
}

Example recommendation of three products:

{"recommendations":[
    {"product":312,"rating":5.0,"name":"High Fidelity (2000)"},
    {"product":494,"rating":5.0,"name":"Monty Python's The Meaning of Life: Special Edition (1983)"},
    {"product":516,"rating":4.0,"name":"Monsoon Wedding (2001)"}
]}