/motogp-bayesian-analysis

Multilevel Beta regression to extrapolate MotoGP rider’s skill accounting for constructor advantage.

Primary LanguageRMIT LicenseMIT

Bayesian Analysis to Infer MotoGP Riders Skill

In this repository it has been implemented a Bayesian model that is able to quantify and discern, in a sports racing context, the skill of the rider from the advantage given by the constructor.

To be able to do that we will use a multilevel Beta regression that models the individual race success as the proportion of outperformed competitors, as described in van Kesteren and Bergkamp, 2022.

Friendly Reminder

If you use or take inspiration from this repository please cite with this link: santurini/Bayesian-Analysis-of-MotoGP-Riders-Skill

Your support will be truly appreciated and feel free to contact me at my following links or just send me an email:

Repository content

The Data

The model was applied to the MotoGP 2016-2021 seasons data that were scraped from the official MotoGP web page and available at the following link as csv files. All the trasnformation can be found in the code and are explained in the report.

This are the first five records of the dataset:

Year Sequence Rider Constructor Position Weather POC POC smoothed
2016 1 Jorge Lorenzo Yamaha Factory 1 Dry 1.00 0.97
2016 1 Andrea Dovizioso Ducati 2 Dry 0.93 0.90
2016 1 Marc Marquez Repsol Honda Team 3 Dry 0.86 0.83
2016 1 Valentino Rossi Yamaha Factory 4 Dry 0.79 0.77
2016 1 Dani Pedrosa Repsol Honda Team 5 Dry 0.71 0.70

The Model

The proposed model is a multilevel Beta regression to estimate the smoothed POC but, as said before, what we are more interested in is the mean of the Beta distribution that is obtained as a sum of the rider skill and constructor advantage.

For each rider r and for each constructor c we specify two parameters: the long term skill/advantage and the seasonal one.

$$ y_{rcs} \sim Beta(\mu_{rcs}, \ \phi), \ \phi = dispersion $$

$$ \mu_{rcs} = \beta_r + \beta_{rs} + \beta_c + \beta_{cs} $$

$$ \beta_r \sim N(0, \sigma_r^2) $$

$$ \beta_{rs} \sim N(0, \sigma_{rs}^2) $$

$$ \beta_c \sim N(0, \sigma_c^2) $$

$$ \beta_{cs} \sim N(0, \sigma_{cs}^2) $$

When taking into account also the weather impact as a boolean variable $\gamma_{1r}$ :

$$ \beta_r = \gamma_{0r} + \gamma_{1r} \cdot weather $$

The Framework

The model was estimated using the software package brms with the default priors for all parameter types. We used 4 Monte Carlo Markov Chains with 10000 iterations and a fixed burn-in of 1000 observations.

The model will output not only the values of skill and advantage for each rider, season and constructor but also the standard deviations of the distributions of the parameters that are the ones we are more interested in in order to evaluate the impact.

The Results

We are satisfied with the results obtained. In fact, both models achieved the results we expected, that is to demonstrate that in MotoGP the rider’s ability is much more influential than the strength of the bike (for the base model) and that in the case of wet races this gap becomes even wider.

Therefore, we believe that the second model is the most complete and suitable for estimating the contributions of the bike and rider in the outcome of a race also under different weather conditions. The real strength of the model is in fact its bivalence for different weather conditions, so it also incorporates the basic model by extending and improving it to make it able to analyze in more detail.

Here some plots of the model outputs to better understand the results of the model:

Posterior Check Overall Performance Skill Evolution

The Report

All the details of the project are extensively discussed in the report that can be found in the repository. The main results of the model and their numerical and non-numerical interpretation are discussed there for those who were interested.