Anime recommendation system on MAL dataset
This is my first recommendation system using collaborative filtering. I only used NearestNeighbors
from sklearn
instead of dedicated libraries like scikit-surprise
because I wanted to challenge myself a little bit.
Dataset used: MyAnimeList Dataset.
Project structure
preprocessing
: source files and notebooks to the preprocess original dataset and build the sparse user-movies matrix.web
: a simple Flask application for demonstrating the system. For simplicity, the method I used for serving is quite ad-hoc.
How to
- Download the following files from the above Kaggle dataset
anime_cleaned.csv.zip
UserAnimeList.csv.zip
- Put them into
preprocessing
folder (do not extract). - Run
preprocessing/build.sh
. - After building, copy following files to
web/models/
anime_db.parquet
movies_index.pkl
user_anime_pivot.pkl
- Run
web/app.py
Note: the user-anime dataset consists of around 80 million rows, so make sure that you have enough memory to run the build script. It ran fine on my laptop with 16 GB of RAM.
You should also make sure that the Python version for building the dataset and serving the Flask application is the same.