PySport/kloppy

Incorrect coordinate scaling Opta & StatsBomb event Deserializer & "secondspectrum"?

UnravelSports opened this issue · 2 comments

I'm loading some Opta and StatsBomb data with coordinates="secondspectrum", but it seems as though the resulting coordinates are not in the correct range.

Here are the min/max values for Opta:

coordinates_x
max:  49.7
min:  -48.1
coordinates_y
max:  50.0
min:  -50.0
end_coordinates_x
max:  50.0
min:  -48.1
end_coordinates_y
max:  50.0
min:  -50.0

Here are the min/max values for StatsBomb with `coordinates="secondspectrum"

coordinates_x
max:  59.95
min:  -57.65
coordinates_y
max:  39.95
min:  -39.95
end_coordinates_x
max:  59.95
min:  -57.95
end_coordinates_y
max:  39.95
min:  -39.95

These values should be on the -52.5, 52.5 and -34, 34 range instead.

Edit: After some further inspection it looks like the SecondSpectrumCoordinateSystem behaviour is incorrect.

@dataclass
class SecondSpectrumCoordinateSystem(CoordinateSystem):
    @property
    def provider(self) -> Provider:
        return Provider.SECONDSPECTRUM

    @property
    def origin(self) -> Origin:
        return Origin.CENTER

    @property
    def vertical_orientation(self) -> VerticalOrientation:
        return VerticalOrientation.BOTTOM_TO_TOP

    @property
    def pitch_dimensions(self) -> PitchDimensions:
        return PitchDimensions(
            x_dim=Dimension(-1 * self.length / 2, self.length / 2),
            y_dim=Dimension(-1 * self.width / 2, self.width / 2),
            length=self.length,
            width=self.width,
        )

Looking at this snippet we can see that it goes wrong when self.length and self.width are set to the length and width provided by the provider (0, 100) and (0,140?) respectively for Opta and StatsBomb for x and (0, 100) and (0, 80?) for y.

Somewhere in the SecondSpectrumCoordinateSystem (or another place entirely) we need to set x_dim, y_dim to (-52.5, 52.5) and (-34, 34)

But, I'm not sure if simply setting it as shown below give the correct/expected behaviour.

@property
    def pitch_dimensions(self) -> PitchDimensions:
        return PitchDimensions(
            x_dim=Dimension(-52.5, 52.5),
            y_dim=Dimension(-34.0, 34.0),
        )

I'd be curious to hear what your thoughts are about the following implementation:

  1. We introduce custom_pitch_dimensions as a parameter to every loading function. This will help setting a fixed x_dim, y_dim and the actual pitch length and width.

We do this such that we can for example first load tracking data. Then from that tracking data fetch the actual pitch dimensions (using dataset.metadata.pitch_dimensions) and insert them into the loading function to align the coordinate systems.

By potentially overwriting the x_dim and y_dim parameters we can make sure all data we load can be fixed to the same dimensions.

data = opta.load(
    f24_data='eventdetails.xml',
    f7_data='matchresults.xml',
    coordinates='secondspectrum',
    custom_pitch_dimensions=PitchDimensions(
        x_dim=Dimension(-52.5, 52.5),
        y_dim=Dimension(-34, 34),
        length=110,
        width=70,
    )
)
  1. We pass the custom_pitch_dimensions to the deserializer, in this example the OptaDeserializer
deserializer = OptaDeserializer(
        event_types=event_types,
        coordinate_system=coordinates,
        event_factory=event_factory or get_config("event_factory"),
        pitch_dimensions=custom_pitch_dimensions
    )
  1. This pitch_dimensions argument comes from the base class EventDataDeserializer.

Within the EventDataDeserialzer we set self.pitch_dimensions as follows:

class EventDataDeserializer(ABC, Generic[T]):
    def __init__(
        self,
        event_types: Optional[List[Union[EventType, str]]] = None,
        coordinate_system: Optional[Union[str, Provider]] = None,
        event_factory: Optional[EventFactory] = None,
        pitch_dimensions: Optional[PitchDimensions] = None
    ):
        .... 
        ....

        self.pitch_dimensions = pitch_dimensions
  1. Then, inside get_transformer() inside the EventDataDeserializer we do check if we have set custom_pitch_dimensions in the loading function.

If we do, we pass custom_pitch_dimensions as part of the kwargs into build_coordinate_system.

Edit: After reading through this again we don't need to do this check, but we can simply pass self.pitch_dimensions directly the the existing build_coordinate_system() function, because within SecondSpectrumCoordinateSystem we do the same check. (see below)

def get_transformer(
        self, length: float, width: float
    ) -> DatasetTransformer:
        from_coordinate_system = build_coordinate_system(
            self.provider,
            length=length,
            width=width,
        )

        if self.pitch_dimensions:
            to_coordinate_system = build_coordinate_system(
                self.coordinate_system,
                length=length,
                width=width,
                custom_pitch_dimensions=self.pitch_dimensions
            )
        else:
            to_coordinate_system = build_coordinate_system(
                self.coordinate_system,
                length=length,
                width=width,
            )

        return DatasetTransformer(
            from_coordinate_system=from_coordinate_system,
            to_coordinate_system=to_coordinate_system,
        )
  1. To accept this custom_pitch_dimensions parameter from kwargs we need to have it as an option inside the coordinate CoordinateSystem class. As shown below.

Now, this is a bit ugly, because CoordinateSystem also has length and width which are also inside PitchDimensions so we might need to clean this up.

@dataclass
class CoordinateSystem(ABC):
    normalized: bool
    length: float = None
    width: float = None
    custom_pitch_dimensions: PitchDimensions = None
  1. Then finally inside each SomeProviderCoordinateSystem class we check if custom_pitch_dimensions has already been initialized, and if so use those PitchDimensions over the regular option.
@dataclass
class SecondSpectrumCoordinateSystem(CoordinateSystem):
    @property
    def provider(self) -> Provider:
        return Provider.SECONDSPECTRUM

    @property
    def origin(self) -> Origin:
        return Origin.CENTER

    @property
    def vertical_orientation(self) -> VerticalOrientation:
        return VerticalOrientation.BOTTOM_TO_TOP

    @property
    def pitch_dimensions(self) -> PitchDimensions:
        if self.custom_pitch_dimensions is not None:  # <-- We do this check here
            return self.custom_pitch_dimensions
        else:
            return PitchDimensions(
                x_dim=Dimension(-1 * self.length / 2, self.length / 2),
                y_dim=Dimension(-1 * self.width / 2, self.width / 2),
                length=self.length,
                width=self.width,
            )

By doing it like this we can still benefit from passing coordinates="secondspectrum" in the loading function, and as such use the pitch orientation and origin settings. And all we do is overwrite the domain and pitch lengths/width.

Finally, I think having the option to align the pitch dimensions between tracking and event data will be a first step into introducing a matching algo for Tracking and Event data.

Now, I'm actually thinking of simply provider a whole CustomCoordinateSystem as an option within coordinates, as shown below.

Currently the biggest issue here is that there is no transform operation for orientation only for vertical_orientation. And, orientation is not a part of the CoordinateSystems classes. I'll still need to figure that part out...

data = opta.load(
    f24_data='eventdetails.xml',
    f7_data='matchresults.xml',
    coordinates=CustomCoordinateSystem(
        normalized=False,
        _origin=Origin.CENTER,
        _orientation=Orientation.FIXED_HOME_AWAY,  
        _vertical_orientation=VerticalOrientation.BOTTOM_TO_TOP,
        x_dim=Dimension(-52.5, 52.5),
        y_dim=Dimension(-34, 34),
        length=105,
        width=68,
    )
)

Note: Ignore the ugly paramaters starting with an underscore, I had to do that because otherwise we'd have conflicts with the CoordinateSystem default properties
We had to change the load function to coordinates: Optional[Union[str, CustomCoordinateSystem]] = None and subsequently do some magic to keep the original 'coordinates' functionality and also be able to pass a custom_coordinates_system.

Another option would be to simply have custom_coordinate_system as a parameter in opta.load but I think this is more user friendly.

if isinstance(coordinates, CustomCoordinateSystem):
        custom_coordinate_system = coordinates
        coordinates = Provider.CUSTOM

    serializer = OptaEventDeserializer(
        event_types=event_types,
        coordinate_system=coordinates,
        event_factory=event_factory or get_config("event_factory"),
        custom_coordinate_system=custom_coordinate_system
    )

Inside the Deserializer we add the below check to get_transformer(). Note, we also have to add "CUSTOM" to the Provider class.

if self.coordinate_system == Provider.CUSTOM:
            to_coordinate_system = self.custom_coordinate_system
        else:
            to_coordinate_system = build_coordinate_system(
                self.coordinate_system,
                length=length,
                width=width
            )

We create a CustomCoordinateSystem that has some dataclass magic to avoid the errors we get when we initialize it with default CoordinateSystem parameters, like so:

T = TypeVar("T")

def required() -> T:
    f: T
    def factory() -> T:
        field_name = f.name  # type: ignore[attr-defined]
        raise ValueError(f"field '{field_name}' required")

    f = field(default_factory=factory)
    return f

@dataclass()
class CustomCoordinateSystem(CoordinateSystem):
    _origin: Origin = required()
    _orientation: Orientation = required()
    _vertical_orientation: VerticalOrientation = required()
    _provider: Provider = Provider.CUSTOM
    x_dim: Dimension = required()
    y_dim: Dimension = required()   
    normalized: bool = field(default=False)

    def __post_init__(self):
        self._pitch_dimensions = PitchDimensions(
            x_dim=self.x_dim,
            y_dim=self.y_dim,
            length=self.length,
            width=self.width,
        )

    @property
    def provider(self) -> Provider:
        return self._provider

    @property
    def origin(self) -> Origin:
        return self._origin

    @property
    def vertical_orientation(self) -> VerticalOrientation:
        return self._vertical_orientation

    @property
    def pitch_dimensions(self) -> PitchDimensions:
        return self._pitch_dimensions