/skate_predict

Predicting figure skating world championship ranking from season performances

Primary LanguageJupyter Notebook

Predict ranking of figure skating world championship from earlier events in the season

Event boxplot

This is my personal project of trying to predict the ranking of skaters in the annual figure skating world championship. The obvious way to rank skaters is by taking their average scores of past competition events in the season and rank them from highest to lowest. However, one potential problem with this approach is that the scores are averaged over different events, and no two events are the same (think different judges, ice conditions, or even altitudes where the events took place). As seen in the below box plot for the male skaters in the 2017 season, the center and spread of scores for each event can be remarkably different from one another.

Therefore, I came up with different ranking models that could tease out the skater effect (how good a skater intrinsically) from the event effect (how does an event affect the score of a skater). All models are coded using numpy and pandas, along with some built-in Python modules (such as itertools).

The project consists of multiple parts:

  • Part 1: simpler linear models with ridge regression (analysis, write-up) Ranking comparisons

  • Part 2: hybrid model (single-factor) learned by gradient descent, with model penalization and early stopping (analysis, write-up) Gradient descent animation

  • Part 3: multi-factor model learned by gradient descent (analysis, write-up) Gradient descent animation

  • Part 4: combine multiple latent factors to rank skaters using logistic regression (analysis, write-up) Gradient ascent animation

  • Part 5: train latent factors in sequence instead of all at once (analysis, write-up) Sequential gradient descent animation

  • Part 6: combine different rankings and final benchmark on test set (analysis, write-up) Test benchmark for male skaters

Data from the project were scraped from the score websites of the International Skating Union (www.isuresults.com). The code used to scrap and clean the scores is found in the data_processing notebook The cleaned scores are found in the scores subfolder, and output visualizations in the viz subfolder.

For any question or feedback, please don't hesitate to contact me here or on Medium!