L.Wood_Final_Capstone_Project

Captone Project Report - Factors for Success on Broadway

Author: Loni J Wood Date: 4/25/2023

Project Overview

The purpose of this project was to discover what factors make a successful show on Broadway. In order to determine success, the goal was to use machine learning models to predict gross sales.

Project Plan Steps

  1. Create GitHub Respository
  2. Define Domain and Question
  3. Data Collection
  4. Data Review and Cleaning
  5. Exploratory Data Analysis
  6. Machine Learning
  7. Analyze Findings

Dataset

The dataset for this project can be found at: https://corgis-edu.github.io/corgis/csv/broadway/

The dataset was originally gathered by the Broadway League. https://www.broadwayleague.com/home/

Report

The overleaf report can be found at: https://www.overleaf.com/read/qwjkgksxjndz. This is the full report for the project and includes steps taken and the details of the analysis and results.

All screenshots and images used in the report can be found at: https://github.com/lwood7983/L.Wood_Final_Capstone_Project/tree/main/images

Additional resources cited in the report can be found at:

Before you begin

This project used both Excel and Python via a Juypter Notebook for the analysis. Before you start, be sure you have the following installed.

File Descriptions

Project Results

Linear regression, multiple regression, polynomial with degree 2, polynomial with degree 3, elastic net regression with degrees 3 and 8 were used to predict gross sales. Each of the models were ran with and without attendance as an independent variable. The reason was due to attendance being an unknown factor in advance. Overall the polynomial with degree 2, or quadratic polynomial, performed the best in predicting gross sales in either scenario. Multiple regression performed the worst and had the highest RMSE and R^2 scores in both scenarios. While polynomial regression with degree 3, or cubic polynomial, performed relatively the same as the quadratic polynomial, there was slight overfitting. Overfitting can cause problems when using new data sets since the model begins to lose its ability to generalize.

Results_with_attendance Results_without_attendance

A deeper analysis was done to identify what was impacting the increase in gross sales over the years. Two possible impacts were investigated: inflation and attendance. The first possible impact was around inflation. The consumer price index was used to calculated the adjusted inflated average ticket price vs the average ticket price per year. When inflation was removed, there was a minimal upward trend in ticket price when compared to the average ticket price.

Inflation_scatter

The second possible impact was attendance. A scatter plot comparing gross sales and attendance revealed that both have an upward trend with a big growth around 2016-2017. The jump could be a related to the cleanup discussed in the article at https://www.city-journal.org/html/unexpected-lessons-times-square\%E2\%80\%99s-comeback-12235.html. The scatter plot also showed that even with attendance declining the gross still was increasing. What was discovered with these gross sales impact factors is that inflation is a key part in the increaseing gross sales, but attendance, which is a non-inflation factor, also has an impact.

gross_att_scatter

Project Limitations

Some project limitations were:

  • The dataset was limited to only August 1990 - August 2016.
  • The dataset did not include any years during or post Covid.

Future Work Ideas

Some future work ideas are:

  • Understand what impact reviews have on the success of a show. This could be done using sentiment analysis from social media or newpapers.
  • Understand what impact Covid had on the independent variables and how it impacted the gross sales.
  • Understand if Tony Awards have any impact on the success of a show or if a successful show wins a Tony Award.
  • Understand if the lead performers in a show impact its success.

Resources

Below are links to some tutorials that were used to assist in building this report.