This Apache Spark tutorial will guide you step-by-step into how to use the MovieLens dataset to build a movie recommender using collaborative filtering with Spark's Alternating Least Saqures implementation. It is organised in two parts. The first one is about getting and parsing movies and ratings data into Spark RDDs. The second is about building and using the recommender and persisting it for later use in our on-line recommender system.
This tutorial can be used independently to build a movie recommender model based on the MovieLens dataset. Neither the use of this dataset nor the use of this algorithm are new (I recommend you for example this EdX course or Google), and this is because we put the emphasis on ending up with a usable model in an on-line environment, and how to use it in different situations. The second part of the tutorial explains how to use Python/Flask for building a web-service on top of Spark models. By doing so, you will be able to develop a complete on-line movie recommendation service.
Part I: Building the recommender
Part II: Building and running the web service
The file server/server.py
starts a CherryPy server running a
Flask app.py
to start a RESTful
web server wrapping a Spark-based engine.py
context. Through its API we can
perform on-line movie recommendations.
Please, refer the the second notebook for detailed instructions on how to run and use the service.