NBA Models

The goal of this project is to create a series of statistical models regarding NBA statistics, with the ultimate goal of implementing a website predicting odds for each major award and all star selections. This includes developing a comprehensive scraper to efficiently pull data from Basketball Reference, respecting the website's Use of Data policies. The scraper is built in Python, and the data analysis is done in R.

This project also aims to eventually combine game statistics with social media data. There is no doubt all star voting and awards selection is heavily influenced by social media trends and media narratives, and I suspect accounting for this would improve the model. The first step for this will be including Twitter data, pending a developer account application.

Towards Data Science

An article on this project has been published by Towards Data Science. This article can be found here.

Getting Started

At this stage, usage of the project is fairly self explanatory. Simply use scraper.py for the scraping functionality, or loadData.R and MVPModels.R. Prerequisites for each of these, and libraries used are listed below or within the respective files.

Currently Implemented

Scraper for MVP data, season player total statistics, season player per game statistics, season player advanced statistics, and team standings
Models predicting MVP outcome using MVP data and total and advanced player statistics
Leave-one-out cross validation system to quickly evaluate new models

Planed Features

Scraper

~~Add scraping for advanced statistics~~
Add scraping for other awards

MVP Models

~~Model using per game statistics~~
Accounting for shortened seasons
Accounting for reverse recency bias
Adding feature to analyze player improvement relative to past season
~~Add advanced stats to models~~
Account for post All-Star break numbers

Long-term

Create models for other awards
Add social media data
Merge data into single dataset and a build a system to query that dataset
Deploy models into live-updating website

Prerequisites

To use this code, you'll need a Python installation, along with Selenium with the appropriate drivers for the web crawler. You'll also need a R distribution installed -- I recommend RStudio. All of this is free to use, with links provided below.

Deployment

The goal for this project is to ultimately deploy it live onto a website, but the project is far from that point. Once the project is deployed, this README will be updated to include deployment instructions.

Built With

Atom - Text editor used
Hydrogen - Interactive coding environment for Atom
RStudio - IDE for R
Selenium - Framework for web crawling

Contributing

Any contributions are welcome, as long as the code is functional and at least some effort is made to document it.

Authors

Caio Brighenti - Developer - CaioBrighenti

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Stack Overflow - For obvious reasons
Harvard Sports Analytics - For inspiration and a baseline comparison

CaioBrighenti/nba-models