Book Recommendation Kaggle dataset: In a very general way, recommender systems are algorithms aimed at suggesting relevant items to users (items being movies to watch, text to read, products to buy or anything else depending on industries).
Content:
The Book-Crossing dataset comprises 3 files.
Users:
Contains the users. Note that user IDs (User-ID
) have been anonymized and map to integers. Demographic data is provided (Location
, Age
) if available. Otherwise, these fields contain NULL-values.
Books:
Books are identified by their respective ISBN
. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (Book-Title
, Book-Author
, Year-Of-Publication
, Publisher
), obtained from Amazon Web Services. Note that in the case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavours (Image-URL-S
, Image-URL-M
, Image-URL-L
), i.e., small, medium, and large. These URLs point to the Amazon website.
Ratings:
Contains the book rating information. Ratings (Book-Rating
) are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.
Kaggle Dataset Source Link: [https://www.kaggle.com/datasets/arashnic/book-recommendation-dataset?resource=download]
- Build and deploy an ML Model (Recommendation System using Collaborative Filtering) which takes a book name as input and recommends similar books as per the collaborative ratings of the users. Along with that it also shows the most popular top 50 books as per the ratings of the users.
Loading the Data: Every csv dataset books
, Users
, and Ratings
are loaded into pandas DataFrames.
-
Data Preparation:
- Cleaning the data
- Handling missing values
- Handling Categorical features
- Feature engineering
- Generated new Features like
num_ratings
,avg_ratings
after merging the relevant dataframes to find the insights. (eg:popular_df
have the top 50 most popular Books as per the ratings of the users.) - Vectorization of user-book interactions: Vectorization of Books having Users' ratings as the elements of the vector in higher dimensional space. (No. of relevant users = Dimensionality of space, No. of relevant Books = No. of vectors (points) in that higher dimensional space.)
- Made the pandas
pivot table
for the vectorization. - Employed
collaborative filtering technique
ascosine similarities
amongst the books to find the recommendation of similar books as per the collaborative ratings of the users. - Defined and implemented the
recommend function
which takes input as Book Title and follows the considerations ofcollaborative filterings
into the account to give the recommendations of similar Books.
- stored (
serialized
) the relevant dataframes and models by dumping them in thepickle
format to load inapp.py
(eg:books.pkl
,popular.pkl
,pivot_table.pkl
,similarity_scores.pkl
).
- Made
Flask app
usingHTML
(forcss
, implementedBootstrap
inside the HTML file) with two pages (index.html
andrecommend.html
) as aUI
inside a Web Application.
Render
deployed link : [https://recommendation-system-for-books.onrender.com/]
- 1
- 2
- 3
app.py:
The main Flask application file.model_files_as_pickle/:
Directory containing the pickled model files.templates/:
HTML templates for rendering the web pages.index.html:
Homepage displaying the most popular books.recommend.html:
Page for entering a book name and displaying recommendations.
requirements.txt:
A list of Python dependencies required to run the project.