NBA-Win-Probability

In this project, I have created a classification model which predicts the outcome of an NBA game (using Python), based on certain features of a play (e.g the score at the time, the type of the play, etc). In order to achieve this I first needed to parse and clean the play by play data from the csv file. This included things like figuring out which features to include, the team the play was executed by, the score for each play, and the winner of the game. After creating all the feature vectors and their respective results, I split the vectors into a training set and a testing set. I then trained the classifier using the training set, and calculated the accuracy of the model on the testing set. From here, I continued to play around with which features to include in the vector, checking whether or not the accuracy of my model improved. After lots of trials, I settled on using a vector that consisted of the number of seconds passed since the beginning the game, away team score, home team score, event type, and team that the play belonged to (e.g. OKC’s rebound, or ORL’s turnover). I then played around with the type of classifier and settled on Python’s Neural Network Classifier. I then pickled the model so that I didn’t need to retrain the classifier every time.

The next step was using this model to create a visualization that allowed users to see how the winning probability of a team changes after each play. I used the pickled classifier to create a JSON file that had predictions on the game’s outcome after each play. Finally, using D3, Javascript, HTML, and CSS, I created an interactive line graph displaying the winning probability of the home team after each play.

Below is an example of a visualization that can be created.

Alt text

The visualization above shows how the easily the momentum or win probability of a team can change after each play. Specifically, the graphs above displays how the winning percentage of the Orlando Magic changes throughout the last 3 minutes of the 4th quarter up until the first 2 minutes of the first Overtime period, in the double overtime game vs the Oklahoma City Thunder on October 30, 2015 (when OKC came back from 15 in the 4th). After Oladipo hit what he probably thought was the dagger step-back three with only 3 seconds left, leaving no timeouts for the Thunder, the Magic had a 64% chance of winning the game (figure 1). However, Westbrook defied the odds by banking in his running 3 point shot from nearly half court with 0.7 left on the clock, swinging the odds back to the Thunder by almost 30% (figure 2)!

When creating the visualization, I wanted viewers to be able to quickly see how the winning percentage of Orlando changed as the game progressed, so I added the line. To make it easier for users to get an approximation of the time left in the game, I added major (black) x-axis tick marks to signify the start/end of a period, and minor (grey) tick marks to signify every minute passed. I also added 0, 0.5, and 1, tick marks on the y-axis to give viewers a quick way to estimate the current win probability of the home team. Next, in order to give a more detailed explanation of each point, I added a tooltip feature, which displayed the current score, play clock, winning percentage, change in winning percentage, and description of the play. I made the tooltip slightly transparent so that it didn't complete obstruct the line graph. After realizing it was too difficult to hover the mouse directly on top of a point, I implemented a voronoi overlay so that the mouse only needs to be near to the point for the tooltip to show up (this is shown in both figures). Even after implementing the voronoi overlay, I found it was difficult to select the exact point I wanted because many points were too close together. Because of this, I implemented a slider bar at the bottom of the graph that allows the user to zoom into a specific segment of the game, making it easier to show the tooltip of a specific point, and view the subtle changes in win probability.