/beer-advocate-rating-predictor

Applied regression algorithms on craft beer datasets to predict BeerAdvocate ratings.

MIT LicenseMIT

Python Numpy Pandas Statsmodels Sci-Kit Learn
PyPI PyPI version PyPI version PyPI version PyPI version

Repo update in progress! Slides live here. Code coming soon

Beer Advocate Rating Predictor

Applied ensembles of regression algorithms on craft beer datasets to predict BeerAdvocate ratings.

Problem Statement

  • Though the beer industry as a whole may not have changed in market size in recent years, strangely enough there’s an increase in the production of craft beer, import beer, and export craft beers, Why?
  • Turns out, the craft beer market is steadily gaining, compared with other, lagging beer categories.
    • Light beer fell from a 47.3% market share in 2014 to a 45.5% share in 2015.
    • Projections for Big Beer in 2016 are more of the same: gradual decline. Big brands are under threat.

Why bother predicting the next hit beers?

  • Craft drinkers are turning away from the name-brand craft beers towards smaller, newer, edgier breweries.
  • If craft breweries are to continue growing, they have to continue satisfying the evolving consumer palate.

Producing the new beers that customers enjoy begins with accurately predicting the popular craft beers of the future.

Data

  • Craftcans.com: Scraped CraftCans.com database for their 2692 craft canned beers and 550 different breweries.

    • Features: Beer name, Style, Size sold (ounces), Alcohol by volume (ABV), IBU’s (International Bitterness Units), Brewer name, Brewer city, Brewer state
  • BeerAdvocate.com: Scraped the BeerAdvocate.com beer ranking tables for their 4000+ craft beer ratings.

    • Features: Features: Beer name, Brewer, Brewer state, Volume of ratings, Volume of reviews, Beer Weighted Rank
    • Why BA? Neither advertisers, sponsorships, or industry relationships influence the opinions expressed by BeerAdvocate and its founders. They have specific Industry Posting Rules that prevent the rating or reviewing of places or beers that users own or are affiliated with. The only biases that might exists come from the community of users and their demographics.

Analytical Approach

Will better explain the questions guiding analyses and a summary of the methods involved.

  • Linear Regression
  • Ridge Regression
  • Lasso
  • CART
  • Extratrees
  • RandomForestRegress
  • AdaBoostedTrees
  • GradBoostedTrees
  • GridSearch

Tool Stack

  • Jupyter notebook
  • Python 3.5
  • BeautifulSoup4, Selenium
  • Statsmodels
  • Sci-kit Learn
  • Matplotlib, Seaborn

Might include step by step series of examples that tell you have to get a development env running

Visuals [In Development]

  • See slide deck. Graphs coming soon

Conclusions

More to come. Will explain insights gleaned, model evaluation, or patterns in visualization.

Future Work

  • feature engineering
  • xgboost
  • hyperparameter tuning

Author

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments & Inspirations