Doing Data Science Project 1 Case Study:
The purpose of this project was to create a presentation for a hypothetical audience of the Budweiser CEO and CFO. The goal was to address the 9 questions / items described below.
- How many breweries are present in each state?
- Merge beer data with the breweries data. Print the first 6 observations and the last six observations to check the merged file.
- Address the missing values in each column.
- Compute the median alcohol content and international bitterness unit for each state. Plot a bar chart to compare.
- Which state has the maximum alcoholic (ABV) beer? Which state has the most bitter (IBU) beer?
- Comment on the summary statistics and distribution of the ABV variable.
- Is there an apparent relationship between the bitterness of the beer and its alcoholic content? Draw a scatter plot. make your best judgment of a relationship and EXPLAIN your answer.
- Budweiser would also like to investigate the difference with respect to IBU and ABV between IPAs (India Pale Ales) and other types of Ale (any beer with “Ale” in its name other than IPA). You decide to use KNN classification to investigate this relationship. Provide statistical evidence one way or the other. You can of course assume your audience is comfortable with percentages.
- Find one other useful inference from the data that you feel Budweiser may be able to find value in. You must convince them why it is important and back up your conviction with appropriate statistical evidence.
Compute the Median Alcohol Content and International Bitterness Unit for Each State. Plot a Bar Chart to Compare.
The median values of Alcohol content and International Bitterness Units are displayed in the tables below. It is worth mentioning that the missing ABV and IBU values effect the presence of breweries on the map. This change in visualization is due to the missing values not being included in the median calculations which effects the representation of that data on the map. A specific example of this would be the removal (grayed out portion) of South Dakota when visualizing the results for the median IBU.
Is There an Apparent Relationship Between the Bitterness of the Beer and Its Alcoholic Content? Draw a Scatter Plot.
It appears that there is a positive relationship between the bitterness of beer and its alcoholic content. We can visualize on a scatter plot that as the bitterness rating increases there tends to be an increase in Alcohol by Volume Percentage (ABV). We created a series of scatter plots that would help clean up the visualization of the relationship between IBU and ABV as they relate to different styles. These plots are below:
First this plot shows us the general relationship with all styles included:
This visualization is the same but filtered for only IPA beverages:
Below you will see a scatter plot filtered to visualize the same ABV, IBU relationship with only Ale beverages.
Finally this scatter plot displays the ABV, IBU relationship with styles filterd to include every type that is NOT IPA, or Ale:
It appears that across most styles of beer there is a positive relationship between IBU and ABV. It is worth discussing however the "Ale" style of beer has somewhat of a curved arguably non-linear relationship. The majority of values in which ABV is greater than 6% appear to be of an equal distribution about the y-axis, and lose their visual power to explain the variance of IBU ratings.
We found with our model that it is possible to predict whether a beer is considered an IPA or Ale based on its IBU and ABV values at an Accuracy of 99.66102%. The model created to do this is a KNN model which uses neighboring values of a "k" amount around the given input to predict what the classification of the item in question is.
-
The top 5 States with the most Breweries are CO (46), CA (39), MI(32), OR(29), and TX(28).
-
Missing values within IBU and ABV are likely due to reporting discrepancies between brewery to brewery, or state to state (legal reporting requirement) this data can serve to be helpful in gaining important insight for various markets so we suggest that efforts are applied when able to obtain this information.
-
The State with the highest Median value for IBU is Maine (ME). The State with the highest Median value for ABV is Washington (DC).
-
The State with the highest value of ABV is Colorado (CO). The State with the highest value of IBU is Oregon (OR).
-
The Distribution of Alcohol by Volume across the United States appears to have a right skew to the data. This right skew can be interpreted as a distribution of data in which there tends to be a higher instance of larger values to the right of the median in the data, as opposed to an equal portion of large and small values to the right and left of the mean. A visual representation of this can be viewed below.
-
Overall It appears that there is a positive relationship between the bitterness of beer and its alcoholic content. We can visualize on a scatter plot that as the bitterness rating increases there tends to be an increase in Alcohol by Volume Percentage (ABV). When we dive deeper into the data and visualize different plots of data It appears that across most styles of beer there is a positive relationship between IBU and ABV. It is worth discussing however the "Ale" style of beer has somewhat of a curved arguably non-linear relationship. The majority of values in which ABV is greater than 6% appear to be of an equal distribution about the y-axis, and lose their visual power to explain the variance of IBU ratings.
-
We found with our model that it is possible to predict whether a beer is considered an IPA or Ale based on its IBU and ABV values at an Accuracy of 84.64%. The model created to do this is a KNN model which uses neighboring values of a "k" amount around the given input to predict what the classification of the item in question is. Additionally, we created a model to generate the optimal K value so that we could fine tune the parameter to generate accurate results.
-
Because Booze and Brews is looking to enter the US Brewery market, we thought it would be insightful to visualize data that would provide insight into the style of brew which hold the greatest (or least) market share. The data show us that 73.53% of the brew offerings are of a non-IPA style beverage. This poses an interesting questions such as:
- Is the current saturation of IPA high or low compared to consumer demand?
- What is the significance of IPAs in terms of sale numbers as a percentage of goods sold?
- Does the relative low percentage of IPA representation mean there is an opportunity to enter the market with IPA? Or does it simply speak to the demand of the IPA offerings?
These questions serve as an opportunity to gather a new perspective of data that can provide invaluable insights into the brewery market and some of the demand currently within the market.
- Finally, Because Booze and Brews is looking to capture Texas market share, we thought it would be insightful to visualize data that would give insight into the distribution of the type of styling characteristic (IBU, ABV) across Cities in Texas. The Data shows us that the median IBU for Texas is 33, and the median ABV for Texas is 5.5%. This information along with the visualizations below provide us with insights that illustrate which of the stylistic profiles are prominent in these geographical areas. We have visual evidence that there does seem to exist a specific flavor profile that Texans prefer, and investigating this further could prove to assist in establishing a successful brewery with offerings that appeal to the palete of the target audience.