The goal of the assignment is to find the attributes of the best & worst coffee shops in the dataset. The text is fairly raw: dates in the review, extra words in the star_rating column, etc. So, we want to clean the data up for a better analysis.
We will start analyzing the corpus of text using text visualizations of token frequency and cleaning the data using techniques such as lemmatization and stopword removal.
Based on the analysis, we will answer the question what makes the best, the best, and the worst, the worst? Graphs and numbers from the analysis should support the conclusions.