/MSBA327_Summer_2019

MSBA 327 Text Analytics Group Project.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

MSBA 327 Text Analytics Group Project

Predicting Movie Success

Which movie genres will be most successful at the Sundance Film Festival?

The objective of the final project is to explore topics in text analytics under uncertainty in greater depth than is permitted in class. The choice of topic is up to you, but it should be related to the general themes of the course. As part of the project you should:

  • describe an approach (existing or newly developed),
  • apply the approach to a problem of interest, and
  • analyze the performance of the approach according to a set of metrics.

Authors

Prepared by Dominc Andreas | Brigite T. Crow | Xiaoting (Theresa) Liu | Kearey Smith

Instructor: Professor Piradee Nganrungruang

Summer 2019

Project Description

We are envisioning ourselves as movie directors that are looking to shoot a film that will land in the Sundance Film festival within a few years. To do this, we believe that we need to see which films are rated at least four out of five stars and see which genres and concepts (i.e. tags) that have seemed to resonate the most with the users. We have analyzed these aspects on the 9,742 historical films in the current MovieLens dataset through Exploratory Data Analysis such as performing descriptive statistics, scatter plot of year versus average rating, frequency of genres, frequently occurring genres and tags for four to five-star rating films, and the top five films with four to five-star ratings and the top five films with 0.50 to one-star ratings. We have implemented these four text analytic approaches: Text Clustering, Sentiment Analysis, Text Classification, and Content-Based Recommender. We have measured success by identifying the top five genres and tags from movies that are four-star rated and up as well as the movie titles, and applied this knowledge to create our movie.

Methodology and Approach

We have examined 4 different approaches to answering our fundamental key important question; Which movie genre will do best at the upcoming Sundance Film Festival?

  • Text Representation (Crow)
  • Exploratory Data Analysis (Andreas and Crow)
  • Text Clustering (Andreas)
  • Sentiment Analysis (Liu)
  • Text Classification (Crow)
  • Content-Based Recommender (Smith)