/nba-monte-carlo

Monte Carlo simulation of the NBA season, leveraging meltano, dbt, duckdb and superset

Primary LanguageMakefileMIT LicenseMIT

MDS in a box

This project serves as end to end example of running the "Modern Data Stack" in a local environment. For those looking for a more integrated experience, devcontainers have been implemented as well. If you have docker and WSL installed, the container can booted up right from VS Code.

Current progress

Right now, you can get the nba schedule and elo ratings from this project and generate the following query. more to come, see to-dos at bottom of readme. And of course, the dbt docs are self hosted in Github Pages, check them out here. image image

Getting started - Windows

  1. Create your WSL environment. Open a PowerShell terminal running as an administrator and execute:
wsl --install
  • If this was the first time WSL has been installed, restart your machine.
  1. Open Ubuntu in your terminal and update your packages.
sudo apt-get update
  1. Install python3.
sudo apt-get install python3.8 python3-pip python3.8-venv
  1. clone the this repo.
mkdir meltano-projects
cd meltano-projects
git clone https://github.com/matsonj/nba-monte-carlo.git
# Go one folder level down into the folder that git just created
cd nba-monte-carlo
  1. build your project
make build pipeline superset-visuals

Make sure to open up superset when prompted (default location is localhost:8088). The username and password is "admin" and "password".

Using Docker and Kubernetes

You can build a docker container by running:

make docker-build

Then run the container using

make docker-run

These are both aliases defined in the Makefile:

docker-build:
	docker build -t mdsbox .

docker-run:
	docker run \
	 	--env MELTANO_CLI_LOG_LEVEL=WARNING \
		--env MDS_SCENARIOS=100 \
		--env MDS_INCLUDE_ACTUALS=true \
		--env MDS_LATEST_RATINGS=true \
		--env MDS_ENABLE_EXPORT=true \
		mdsbox make pipeline

You can then scale out to Kubernetes, assuming you have it installed:

kubectl apply -f ./kubernetes/pod.yaml

Using Parquet instead of a database

This project leverages parquet instead of a database for file storage. This is experimental and implementation will evolve over time.

Todos

  • replace reg season schedule with 538 schedule
  • add table for results
  • add config options in dbt vars to ignore completed games
  • make simulator only sim incomplete games
  • add table for new ratings
  • add config to use original or new ratings
  • cleanup dbt-osmosis
  • clean up env vars + implement incremental builds
  • clean up dev container plugins (remove irrelevant ones, add some others)
  • add dbt tests on simulator tables that no numeric values are null (elo ratings, home team win probabilities)

Optional stuff

  • add dbt tests
  • add model descriptions
  • change elo calculation to a udf
  • make playoff elimination stuff a macro (param: schedule type)

Source Data

The data contained within this project comes from 538, basketball reference, and draft kings.