/restaurant-and-film-reviews

[PYTHON] Used nltk, pandas, and sklearn to perform text representation on online restaurant and film reviews.

Primary LanguagePython

Restaurant and Film Reviews Text Representation

Overview

In Python, I used the following packages to perform text representation on online restaurant and film reviews: nltk, pandas, and sklearn. To begin, I preprocessed the data by tokenizing, lemmatizing, and removing all stop words and punctuation from each review. Once preprocessed, each word in each review was tagged with its respective part of speech label. Lastly, I performed a TF-IDF (Term Frequency-Inverse Document Frequency) vectorization on the data to determine how relevant the words were to each review, which can be viewed in the post-tag.csv file.

Dataset

https://github.com/lindngo/restaurant-and-film-reviews/blob/main/reviews-data.csv