Goodreads-Genre-Reviews-Analysis

Ashley Feiler's term project for Data Science for Linguists 2023

Date: May 1, 2023

What's the Story?: Linguistic Variation in Goodreads Reviews by Genre

This project examines the text of public Goodreads reviews on books from different genres (young adult, mystery, fantasy, biography, etc.) and see if the way these reviewers use language differs based on the genre they are reading/reviewing. Which genre's readers write the longest reviews? What adjectives do reviewers use most frequently for each genre? Is there a difference in sentiment in the language used when reviewing one genre vs. another? Through this project, I explore all these questions and more.

Data

I used the data from the UCSD Book Graph corpus which contains public book review data scraped from Goodreads (specifically, I used their subsets of the data by genre). It includes over 15 million reviews in JSON files in addition to separate JSON files with metadata on the books, authors, and genres reviewed. In order to make the data a more manageable size, I took a 5000-review sample from each of 8 genres for 40000 total reviews in my dataset. This dataset is specified for academic use only and is not to be redistributed, so I only share small samples of this data.

Guestbook

Come visit my guestbook!! I'm always grateful for feedback.

References

UCSD Data References

Fine-Grained Spoiler Detection from Large-Scale Review Corpora (Wan et al., ACL 2019)

Mengting Wan and Julian McAuley. 2018. Item recommendation on monotonic behavior chains. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). Association for Computing Machinery, New York, NY, USA, 86–94. https://doi.org/10.1145/3240323.3240369

NLTK Sentiment References

Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

Data-Science-for-Linguists-2023/Goodreads-Genre-Reviews-Analysis