Currently, the webapp and the underlying recommender model works to recommend content-wise most similar books upon user's input using Cosine Similarity. It's built upon popular 7k Books dataset.
Step 1: Identification of less valuable columns
books=df.drop(columns=['isbn10','subtitle','thumbnail','published_year','num_pages'])
Step 2: Transformation of 'ratings'-related columns
The 'average rating' and 'ratings count' values were taken into account to make a new 'popularity' value which was then manipulated to convert quantitative values into categorical values for better training.
books['popularity']=(books['average_rating']*books['ratings_count'])/100000
books['popularity']=books['popularity'].apply(changeRating)
Step 3: Treatment of missing values
books.dropna(inplace=True)
Step 4: Stemming
It is a widely-used NLP technqiue to bring individual words into their simplest forms so that the model treats words like 'reads' and 'reading' to be similar in meaning. An NLTK module named PorterStemmer is used for this purpose.
Step 5: Count Vectorization
After the preprocessing steps, SKlearn is used on the final dataframe to make vectors of words based on the count of words.
vect=CountVectorizer(max_features=4000,stop_words='english')
vector=vect.fit_transform(books['label']).toarray()
Step 6: Cosine Similarity
It is a computation method which calculates the similarities and differences between two vectors based on the cosine angle difference between them. SKlearn's 'cosine_similarity' module takes the count vector generated after Step 5 as input and outputs a vector showing the proximity of all the datapoints with one another.
proximityVector=cosine_similarity(vector)
- Head to deploy branch
- Clone the repo to your local directory.
- Navigate to your directory.
- Meet all environment requirements.
pip install -r requirements.txt
- Get the Flask running
flask run
Deployed on Heroku
Re-Deployed on Python Anywhere