This project is designed to analyze book data retrieved from Google Books API and build a content-based recommendation system based on the book descriptions. The system is built using R programming language for data preprocessing and cleaning and Python for data visualization and content-based recommendation system.
The book data is retrieved from Google Books API. The retrieved data includes the book title, author, description, rating, and other details.
The retrieved data is cleaned and preprocessed using R programming language. Packages such as skimr, dplyr, and IRdisplay are used for data cleaning, preprocessing, and analysis. The data is cleaned by removing duplicates, missing values, and irrelevant columns. The preprocessing steps include text normalization, tokenization, and stop words removal.
Data visualization is performed using Python programming language. Packages such as matplotlib, seaborn, and plotly are used for data visualization. The data is visualized using various plots such as bar charts, scatter plots, and histograms.
The content-based recommendation system is built using the cosine similarity between book descriptions. The system is designed to recommend books based on their similarity to a given book. The system is implemented using Python programming language.