New for 2023: player retention calculation
Volleyball, a much beloved sport, is significantly lacking in the category of advanced statistics. This is an attempt to help bring the widely used elo rating system into men's collegiate volleyball.
The main input for this elo model is wins/losses, but sets won, home court advantage, and distance traveled disadvantage is also taken into account. (Points data is in the data set, but currently not used in the elo calculation.)
Huge thanks to Jonathan Bates @mpsf_bias for contributing thoughts and tons of data.
Always looking for more data. Let me know if you are interested in helping with data collection.
[Games](inputs/VBelo - games.csv) - A list of games where at least one team is eligible to compete for the NCAA D-I/II Championship. Currently, dataset include seasons starting from 2020.
Field | Description |
---|---|
date | Date the game was played. |
season | Calendar year season for the game (for future data). |
home | Displays home teams. If neutral court, field is blank. |
neutral | If played on neutral court, value = 1. |
playoff | If game is a playoff game, value = 1. |
team1 | Away team. If neutral court, team with "short_name" that comes alphabetically first. |
team2 | Home team. If neutral court, team with "short_name" that comes alphabetically second. |
result_team# | Win = 1. Loss = 0. |
sets_team# | Number of sets team# won. |
points_team# | Number of total points team# scored in all sets. |
elo_start_team# | The elo rating at the beginning of the match for team#. |
probability_team# | Probability that team# will will the game given both teams beginning elo rating. |
elo_end_team# | The elo rating at the end of the match after match results are complete. |
[Teams List](inputs/VBelo - teams.csv) - A complete list of D-I, D-II, and D-III men's volleyball programs. Select NAIA and NCCAA schools added if they play a Championship eligible team.
Field | Description |
---|---|
short_name | School name used in games spreadsheet. |
full_name | Full name of the school, in case it is needed or unclear. |
division | D-I, D-II, D-III, NAIA, or NCCAA. |
mascot | ...This data could be helpful. |
conference | For NCAA teams, the conference that they play men's volleyball in. For NAIA, value = NAIA. |
elo | Placeholder for elo rating as matches are played. In output, this is where elo is stored. |
location | Longitude and latitude for home games. |
eligible | Teams that compete for the D-I/II championship and are therefore have their elo tracked, value = 1. |
The goal is to include all NCAA D-I and D-II teams since they are all competing for the same national championship. This includes the following conferences:
- Big West
- Conference Carolinas
- EIVA
- MIVA
- MPSF
- SIAC
- Independent Teams
Sadly, there is not enough resources to collect all of the data needed to include all collegiate men's volleyball. (If someone wants to collect that data, they could use this same code, though.)
Since D-I and D-II teams often play non-conference games against D-III, NAIA, or NCCAA opponents, they have a static role in the model (i.e. their elo is always the same.) Thanks to volleyball stat nerd Jonathan, the average winning percentage of these teams vs. D-I and D-II programs was used to get their static elo ratings.
- D-III = 1419
- NAIA/NCCAA = 1373
Currently, sets won/lost are taken into account. Eventually, points will be added too.
There is a seemingly significant advantage to playing at home in NCAA Men's Volleyball. Between 2015 and 2020, home teams won a little over 63% of the time. (All credit to this goes to Jonathan Bates.) Since 2021 has seen a slight decrease in home court advantage, the adjustment in the model for being the home team is equal to 50 elo points. This is equivalent to about an 8% advantage. While this is below the "actual" home court advantage, this leaves room for other factors, such as travel.
Traveling is hard, so I want the model to reflect that. By looking at mean squared errors, and some trial and error, the current penalty for distance traveled is as follows:
- -1 elo for every 250 miles traveled to the venue
- There is a max penalty of -25 elo (no one has reached this yet)
- For neutral site games, both teams could get this travel penalty (depending on how far the match is from both schools)
Obviously, no program has the exact same roster two years in a row. To help adjust for graduates and/or transfers (and potentially elo inflation), at the beginning of each season, every team will have their elo reverted by 1/4 to 1500 (the average elo).
For new programs to the D-I/II level starting in 2022, they will be given a starting elo of 1419 (the static elo given to D-III teams). They will start as if they were D-III teams and then their elo will change with every game they play.
It is currently 30. That was the best value to optimize the mean squared error (~0.17), along with other factors. As more data is collected, this will be revisited.