/book-feature-to-rating-analysis

The Anatomy of a Goodreads Book Rating: Book Feature to Rating Analysis

Primary LanguageHTML

The Anatomy of a Goodreads Book Rating: Book Feature to Rating Analysis

Data Mining and Wrangling Course Submission

See full report HERE

Executive Summary

Goodreads is the leading book review website trusted by millions of users worldwide. In this project, our group analyzed the different book features and their relationship with book ratings. This was done using various data mining, wrangling, and visualization techniques. Results show that the most likely predictors for book rating are the number of book pages and the number of ratings. Insights on the other feature interactions also emerged. It was found that e-books are more prevalent for the romance genre and scarce for children’s books and comics. Further, faster reading time is observed for the e-book format. The data also validated the common notion that reading time is longer for books with higher number of pages and that there is a higher occurrence of text reviews compared to non-text.

Since there are limited book features in the dataset, it is recommended in future studies to extract user profiles as well, such as their age, gender, and other demographic and psychometric information of the reviewers. As for the current dataset available, it would be better to perform the methodology on a larger portion of the database to validate this study and gain more accurate information. Machine learning algorithms could also be explored.

Contributors

dela Resma, Marvee

Ginez, Zhoya

Inocencio, Ken

Nepomuceno, Colleen

Piquero, Geran

Punzalan, Paolo