NextFlex is a cloud-based interactive movie recommendation system. It generally has 2 main features: browsing movie data and getting movie recommendations. The users can explore the movie dataset without any constraint as a guest. When they register into our system, they can ask NextFlex for personalized recommendations or post their own ratings and upload images.
The user will basically have 4 options for getting recommendations
- A: Get the latest or popular movies list.
- B: Get top-rated movies based on our top ranking model (Global).
- C: Get most similar k movies based on a given movie title and number k (Content-based).
- D: Get personalized recommendations based on user ratings (Collaborative Filtering).
We mainly use The movies datasets from Kaggle, and we have done data cleaning, filtering, merging with these datasets.
We built our frontend with React. All the web pages are created by Javascript. In addition, we design the web page with Material ui components, such as icons, buttons, tables, etc. To make the system more user friendly, we use TMDB API to fetch the movie data including poster images links, and then we use React to parse the json object to show movie data on the web page.
The backend is a combination of Django REST framework
and Recommendation engine
.
We set up the backend with the Django REST framework. We use Axios as a bridge between the frontend and backend. By sending POST
or GET
requests to the Djdango API, the frontend will receive responses in the form of JSON objects. And then React will parse the JSON object and convert it into data in the web page.
Since we created the machine learning engine with python libraries such as pandas and sklearn, the recommender engine is built in the Django backend.
As we stated above, we have built three corresponding prediction functions using Python and Spark: we use weighted average matrix
for Global recommendations, cosine similarity matrix
for Content-based recommendations, and an Alternating Least Squares (ALS) model
from Spark for Collaborative filtering. The reason we choose Spark over pandas is due to the size of our original rating dataset (1.13GB), which can be processed & exported more easily through Spark & Hadoop.
Since Django natively uses a relational database, we use AWS RDS to set up the MySQL database for Django’s data management, such as saving the posts data that users upload:
NextFlex has different types of functions:
We build the user authentication functions with Redux on the frontend. Redux will store the user login information in the local session. On the backend, we use Django’s djoser package and JWT(Json Web Tokens). Each GET or POST request needs to have a header with the authentication tokens a logined user received, and Axios handles this job automatically.
Users can either rate existing movies in the dataset, or rate their favorite movies by posting their data and uploading images to the cloud database. We also add a function to extract meta of the images. When a user uploads an image, the system will automatically get the image’s metadata such as width, height, size, name, dates, etc.
As we stated, NEXTFLEX is an interactive movie recommendation system. All the recommendation functions can be easily called by the users’ inputs or clicks, intuitively and user friendly.
-
Option A: Get the latest or popular movies list
The TMDB API movie recommendation is the simplest function. Users can easily browse the movie lists with beautiful posters showing in the UI. The other three recommendation engines will be called by user interactions.
-
Option B: Get top-rated movies
This recommender is based on our top ranking model of all the users’ ratings. It simply takes the average of global user ratings by querying S3. But we want to add some weight here: if a movie is pretty new, it won’t have too many user ratings, but if that movie is actually pretty good, we don’t want to downgrade it just because it has fewer ratings. We also ignore movies with extremely small amounts of ratings, and consider the first 90% films.
-
Option C: Get most similar k movies
This recommendation is based on the dataset called movies metadata. The user will input a movie title and a number k to recommend. The recommender will basically try to find movies that are most similar to the user specified movie. It first queries S3 and reads the csv dataset, and then uses pandas for data processing: extracting key attributes such as crews, directors, genres, etc. Then it creates a soup by merging all extracted features into one single feature, ready to use for sklearn. Finally we bring in sklearn to calculate the cosine similarity matrix, sort it, and output the top results. Here it is mentionable that when the Django server starts on the backend, the recommender engine will automatically run and compute the cosine similarity matrix for only once. When this job is finished, the server will hold the ML features in memory and wait for requests from the frontend, so the result will be shown on the UI instantly.
-
Option D: Get personalized recommendations based on user ratings
The last recommender uses the current logged-in user’s rating to recommend. It mainly brings in an ALS Model from spark to do the job. ALS Model is a commonly-used, powerful yet simple model for collaborative filtering. Another reason we choose Spark over Pandas is that our original rating dataset is relatively large (1.13GB), which can be processed and exported more easily through Spark & Hadoop. We first trained our ALS Model via google colab for the reason of instant code sharing between teammates. We fed the ALS Model with three attributes: user_id, movie_id, and user_rating. Since we have more than 270,000 data points, we are very confident that we will have a decent model. We then test and evaluate our ALS Model, export it, and upload it to S3 database. So each time a user asks for a recommendation based on his/her own rating we call the model & return the top-ranked movies.
-
CloudDB:
AWS S3: For datasets storage, AWS RDS: For Django database
-
Data Processing & Recommender Engine:
Pandas, Numpy, Sklearn, Pyspark, Spark ALS
-
UI design: Django REST framework / Djoser JWT, React, Material-UI, Redux, Axios, TMDB API