This project serves as end to end example of running the "Modern Data Stack" in a local environment. For those looking for a more integrated experience, devcontainers have been implemented as well. If you have docker and WSL installed, the container can booted up right from VS Code.
Right now, you can get the nba schedule and elo ratings from this project and generate the following query. more to come, see to-dos at bottom of readme. And of course, the dbt docs are self hosted in Github Pages, check them out here.
- Create your WSL environment. Open a PowerShell terminal running as an administrator and execute:
wsl --install
- If this was the first time WSL has been installed, restart your machine.
- Open Ubuntu in your terminal and update your packages.
sudo apt-get update
- Install python3.
sudo apt-get install python3.8 python3-pip python3.8-venv
- clone the this repo.
mkdir meltano-projects
cd meltano-projects
git clone https://github.com/matsonj/nba-monte-carlo.git
# Go one folder level down into the folder that git just created
cd nba-monte-carlo
- build your project
make build pipeline superset-visuals
Make sure to open up superset when prompted (default location is localhost:8088). The username and password is "admin" and "password".
You can build a docker container by running:
make docker-build
Then run the container using
make docker-run
These are both aliases defined in the Makefile:
docker-build:
docker build -t mdsbox .
docker-run:
docker run \
--env MELTANO_CLI_LOG_LEVEL=WARNING \
--env MDS_SCENARIOS=100 \
--env MDS_INCLUDE_ACTUALS=true \
--env MDS_LATEST_RATINGS=true \
--env MDS_ENABLE_EXPORT=true \
mdsbox make pipeline
You can then scale out to Kubernetes, assuming you have it installed:
kubectl apply -f ./kubernetes/pod.yaml
This project leverages parquet instead of a database for file storage. This is experimental and implementation will evolve over time.
- replace reg season schedule with 538 schedule
- add table for results
- add config options in dbt vars to ignore completed games
- make simulator only sim incomplete games
- add table for new ratings
- add config to use original or new ratings
- cleanup dbt-osmosis
- clean up env vars + implement incremental builds
- clean up dev container plugins (remove irrelevant ones, add some others)
- add dbt tests on simulator tables that no numeric values are null (elo ratings, home team win probabilities)
- add dbt tests
- add model descriptions
- change elo calculation to a udf
- make playoff elimination stuff a macro (param: schedule type)
The data contained within this project comes from 538, basketball reference, and draft kings.