Creating support for expected goals and game state values in our kloppy data model
DriesDeprest opened this issue · 2 comments
Expected goals
By adding an optional xg
attribute to our ShotEvent
class, we can support the widely used expected goal property in kloppy. This property could be fed by the raw input data during deserialization (e.g. StatsBomb) or in a later stage could be calculated by the user using an xG model of choice.
Proposed implementation:
@dataclass(repr=False)
@docstring_inherit_attributes(Event)
class ShotEvent(Event):
"""
ShotEvent
Attributes:
event_type (EventType): `EventType.SHOT` (See [`EventType`][kloppy.domain.models.event.EventType])
event_name (str): `"shot"`,
result_coordinates (Point): See [`Point`][kloppy.domain.models.pitch.Point]
result (ShotResult): See [`ShotResult`][kloppy.domain.models.event.ShotResult]
xg (ExpectedGoal): See [`ExpectedGoal`][kloppy.domain.models.event.ExpectedGoal]
"""
result: ShotResult
result_coordinates: Point = None
event_type: EventType = EventType.SHOT
event_name: str = "shot"
xg: Optional[ExpectedGoal]
@dataclass
class ExpectedGoal:
"""
Expected goal metrics of an event
Attributes:
xg: The probability of scoring from the shot situation, not considering shot execution characteristics
execution_xg: The probability of scoring following the execution of the shot
gk_difficulty_xg: The probability of a goalkeeper conceding a goal
"""
xg: Optional[float] = field(default=None)
execution_xg: Optional[float] = field(default=None)
gk_difficulty_xg: Optional[float] = field(default=None)
@property
def net_shot_execution(self) -> Optional[float]:
return None if None in (self.xg, self.execution_xg) else self.execution_xg - self.xg
Game state values
By adding optional gs_scoring_before
, gs_scoring_after
, gs_conceding_before
and gs_conceding_after
attributes to our Event
class, we can support the widely used game state based value models in kloppy. This property could be fed by the raw input data during deserialization (e.g. StatsBomb's on-the-ball value models) or in a later stage could be calculated by the user using a game state value model of choice (e.g. VAEP).
Proposed implementation:
@dataclass
@docstring_inherit_attributes(DataRecord)
class Event(DataRecord, ABC):
"""
Abstract event baseclass. All other event classes inherit from this class.
Attributes:
event_id: identifier given by provider
team: See [`Team`][kloppy.domain.models.common.Team]
player: See [`Player`][kloppy.domain.models.common.Player]
coordinates: Coordinates where event happened. See [`Point`][kloppy.domain.models.pitch.Point]
raw_event: Dict
state: Dict[str, Any]
qualifiers: See [`Qualifier`][kloppy.domain.models.event.Qualifier]
"""
event_id: str
team: Team
player: Player
coordinates: Point
result: Optional[ResultType]
gsv: Optional[GameStateValue]
raw_event: Dict
state: Dict[str, Any]
related_event_ids: List[str]
qualifiers: List[Qualifier]
freeze_frame: Optional["Frame"]
@dataclass
class GameStateValue:
"""
Game state value metrics of an event.
Attributes:
gsv_scoring_before (Optional[float]): The probability the team will score in X actions prior to the event.
gsv_scoring_after (Optional[float]): The probability the team will score in X actions after the event.
gsv_conceding_before (Optional[float]): The probability the team will concede a goal in X actions before the event.
gsv_conceding_after (Optional[float]): The probability the team will concede a goal in X actions after the event.
"""
gsv_scoring_before: Optional[float] = field(default=None)
gsv_scoring_after: Optional[float] = field(default=None)
gsv_conceding_before: Optional[float] = field(default=None)
gsv_conceding_after: Optional[float] = field(default=None)
@property
def gsv_scoring_net(self) -> Optional[float]:
return None if None in (self.gsv_scoring_before, self.gsv_scoring_after) else self.gsv_scoring_after - self.gsv_scoring_before
@property
def gsv_conceding_net(self) -> Optional[float]:
return None if None in (self.gsv_conceding_before, self.gsv_conceding_after) else self.gsv_conceding_after - self.gsv_conceding_before
@property
def gsv_total_net(self) -> Optional[float]:
if None in (self.gsv_scoring_before, self.gsv_scoring_after, self.gsv_conceding_before, self.gsv_conceding_after):
return None
return (self.gsv_scoring_after - self.gsv_scoring_before) - (self.gsv_conceding_after - self.gsv_conceding_before)
Any feedback is highly welcome!
A few thoughts:
- I would also add xA, xT, execution ratings, decision ratings, win probability, pitch control, pitch influence, and pressing intensity. 😄 But jokes aside, my main point is that if we create a separate field for each metric, things could get pretty complex and it might quickly explode. I think it's a better idea to use a single list, dict or custom container to store all metrics.
- I would attach this container for metrics to the
DataRecord
class since it is also possible to compute metrics for tracking data frames (e.g., pitch control). - I suppose we want a base class for metrics and a few subclasses.
from typing import Optional, Dict, Union, List
from dataclasses import dataclass, field
from abc import ABC, abstractmethod
import numpy as np
@dataclass
class Metric(ABC):
name: str
provider: Optional['Provider'] = None
@dataclass
class ScalarMetric(Metric):
value: float
@dataclass
class PlayerMetric(Metric):
value: Dict['Player', float]
@dataclass
class SurfaceMetric(Metric):
value: np.ndarray
def value_at(self, loc : Point):
return value[loc.y, loc.x]
Then, you can define classes for the most common metrics as
class ExpectedGoals(ScalarMetric):
"""Expected goals""""
name = "xG"
class PostShotExpectedGoals(ScalerMetric):
""""Post-shot expected goals"""
name = "PsXG"
class GameStateValue(ScalarMetric):
"""Game state value""""
gsv_scoring_before: Optional[float] = field(default=None)
gsv_scoring_after: Optional[float] = field(default=None)
gsv_conceding_before: Optional[float] = field(default=None)
gsv_conceding_after: Optional[float] = field(default=None)
@property
def gsv_scoring_net(self) -> Optional[float]:
return None if None in (self.gsv_scoring_before, self.gsv_scoring_after) else self.gsv_scoring_after - self.gsv_scoring_before
@property
def gsv_conceding_net(self) -> Optional[float]:
return None if None in (self.gsv_conceding_before, self.gsv_conceding_after) else self.gsv_conceding_after - self.gsv_conceding_before
@property
def value(self) -> Optional[float]:
if None in (self.gsv_scoring_before, self.gsv_scoring_after, self.gsv_conceding_before, self.gsv_conceding_after):
return None
return (self.gsv_scoring_after - self.gsv_scoring_before) - (self.gsv_conceding_after - self.gsv_conceding_before)
- I am not sure whether "metric" is the right terminology here. In the context of soccer analysis, a metric typically involves the aggregation or analysis of multiple data points. For example, if you are tracking the number of goals scored by a soccer player in each game, each individual game's goal count would be a data point. If you calculate the average number of goals scored per game over a season, that average becomes a metric. To make this distinction, I prefer to use "statistic" in the context of a single data point.
-
Good point 😅. I agree that adding a list of
Statsitic
s is probably a better way to keep it clean and still have a lot of flexibility in adding statistics. -
Agree.
-
Makes sense!
-
Fine to use "statistic" in this terminology.
Below, an updated version of how the DataRecord
class would change, based on your inputs:
@dataclass
class DataRecord(ABC):
"""
DataRecord
Attributes:
dataset: Reference to the dataset this record belongs to.
prev_record: Reference to the previous DataRecord.
next_record: Reference to the next DataRecord.
period: See [`Period`][kloppy.domain.models.common.Period]
timestamp: Timestamp of occurrence.
ball_owning_team: See [`Team`][kloppy.domain.models.common.Team]
ball_state: See [`BallState`][kloppy.domain.models.common.BallState]
statistics: List of Statistics associated with this record.
"""
dataset: "Dataset" = field(init=False)
prev_record: Optional["DataRecord"] = field(init=False)
next_record: Optional["DataRecord"] = field(init=False)
period: "Period"
timestamp: float
ball_owning_team: Optional["Team"]
ball_state: Optional["BallState"]
statistics: List[Statistic] = field(default_factory=list)
I'll probably be working on implementing this in the near future and adding a parser for StatsBomb.