To find the greatest-of-all-time using statistics. I have created this repository to explore different methods for estimating relative skill of NBA teams. I was inspired by FiveThirtyEight.com's CARMElo system. My aim is to dive-deep into every method, derive the update equations and discuss the pros and cons. I intend to cover the following methods:
- Elo and common extensions.[DONE]
- Assumed Density Filtering.[IN PROGRESS]
- Expectation Propagation(given that this NBA data has already ocurred and can be used for batched inference.)
- Extensions to EP(score difference).
The complexity of methods increases according to the standard, universally accepted W3D difficulty rating system. You have been warned.
All the text is written in Markdown. To avoid rendering issues, I would recommend to view it on nbviewer using the links provided below. The 2nd cell of the notebooks contains some javascript which hides all the input code cells for a pleasant reading experience. If you're interested in the code, then please click on the here link in the 2nd cell. The code for generating the graphical models is in scratch.ipynb
.
https://nbviewer.jupyter.org/github/priyamtejaswin/nba-goat/blob/master/nb-elo_vanilla.ipynb
- Tracking NBA franchises through changes in names and cities.
- Explain Elo with its core assumptions and apply the vanilla Elo on nba data.
- Extend the base model to account for score difference(mov-Elo).
- Extend the base model to account for home-court advantage(hca-Elo).
- Finish with interactive visualisation of Warriors and Bulls.
- Discuss and segue to ADF and TrueSkill.
Scroll down to the last cell for an interactive visualisation for two of my favorite teams!
https://nbviewer.jupyter.org/github/priyamtejaswin/nba-goat/blob/master/nb-adf_team.ipynb
- Start by explaining the 2 core operations (convolution, greater-than).
- Explain the clutter problem and the complexity involved with calculating the exact posterior.
- Derive the parameter updates for the clutter problem using ADF.
- Visualise the update procedure.
- Setup the skill estimation problem in context of ADF.
- Derive updates using ADF.
- Apply on NBA data.
- Compare against mov-Elo from previous notebook.
Scroll down to the last cell to view ADF in action while estimating the true mean in noise!
-
Add chart/figure/timeline A brief history of nba franchises detailing the change in names. Use the DataFrame or https://www.basketball-reference.com/teams/ for the actual data. For team names with abbreviations, parse http://www.apbr.org/abbreviations.html using the following regex:
/[A-Z]{3}\ \-\ [A-Za-z\ \-\\\/]+\([A-Z\ \\\-\/]+\)/g
. -
Think of good examples to demonstrate strengths and weaknesses of the methods. Bulls got MJ, MJ left. Shaq left Lakers at their peak and then won again with the Heat in 2006. Warriors turnaround. Jason Kidd started with the Mavs, then the Nets, then the Mavs again.
-
Account for carry over performance after every season.