Anime recommendation system on MAL dataset

This is my first recommendation system using collaborative filtering. I only used NearestNeighbors from sklearn instead of dedicated libraries like scikit-surprise because I wanted to challenge myself a little bit.

Dataset used: MyAnimeList Dataset.

Project structure

preprocessing: source files and notebooks to the preprocess original dataset and build the sparse user-movies matrix.
web: a simple Flask application for demonstrating the system. For simplicity, the method I used for serving is quite ad-hoc.

How to

Download the following files from the above Kaggle dataset
- anime_cleaned.csv.zip
- UserAnimeList.csv.zip
Put them into preprocessing folder (do not extract).
Run preprocessing/build.sh.
After building, copy following files to web/models/
- anime_db.parquet
- movies_index.pkl
- user_anime_pivot.pkl
Run web/app.py

Note: the user-anime dataset consists of around 80 million rows, so make sure that you have enough memory to run the build script. It ran fine on my laptop with 16 GB of RAM.

You should also make sure that the Python version for building the dataset and serving the Flask application is the same.

hungngocphat01/MAL-Recommender-kNN

Anime recommendation system on MAL dataset

Project structure

How to

Short demo