- This Project consists of an R Shiny Dashboard displaying recent NBA data, analyses, and trends, along with all of the processes behind the ELT Pipeline that supports it.
- NBA Data is web scraped in Python on a Cron Schedule ran via ECS Fargate, and data is subsequently stored to source tables in a PostgreSQL Database.
- dbt Cloud executes data transformations in SQL on a Cron Schedule following the ECS Task, and also performs automated schema testing, quality checks, and data validation assertions primarily via dbt_expectations.
- An ML Pipeline is then run in Python via ECS Fargate to predict Team Win %s for Upcoming Games that night.
- AWS Step Functions is used as an orchestration tool to run all 3 tasks in sequence.
- The Shiny Server is built & deployed to ECS where it queries from the transformed SQL tables to display current stats, player metrics, gambling odds, and upcoming schedule data.
-
Links to other Repos providing infrastructure for this Project
-
Main R Packages Used
- Shiny
- Tidyverse
- Plotly
- GT
- renv
-
Main Python Packages Used
- pandas
- BeautifulSoup4
- SQLAlchemy
- Boto3
- PRAW
- NLTK
- tweepy
- scikit-learn
- pytest