Incorrect coordinate scaling Opta & StatsBomb event Deserializer & "secondspectrum"?
UnravelSports opened this issue · 2 comments
I'm loading some Opta and StatsBomb data with coordinates="secondspectrum"
, but it seems as though the resulting coordinates are not in the correct range.
Here are the min/max values for Opta:
coordinates_x
max: 49.7
min: -48.1
coordinates_y
max: 50.0
min: -50.0
end_coordinates_x
max: 50.0
min: -48.1
end_coordinates_y
max: 50.0
min: -50.0
Here are the min/max values for StatsBomb with `coordinates="secondspectrum"
coordinates_x
max: 59.95
min: -57.65
coordinates_y
max: 39.95
min: -39.95
end_coordinates_x
max: 59.95
min: -57.95
end_coordinates_y
max: 39.95
min: -39.95
These values should be on the -52.5, 52.5 and -34, 34 range instead.
Edit: After some further inspection it looks like the SecondSpectrumCoordinateSystem
behaviour is incorrect.
@dataclass
class SecondSpectrumCoordinateSystem(CoordinateSystem):
@property
def provider(self) -> Provider:
return Provider.SECONDSPECTRUM
@property
def origin(self) -> Origin:
return Origin.CENTER
@property
def vertical_orientation(self) -> VerticalOrientation:
return VerticalOrientation.BOTTOM_TO_TOP
@property
def pitch_dimensions(self) -> PitchDimensions:
return PitchDimensions(
x_dim=Dimension(-1 * self.length / 2, self.length / 2),
y_dim=Dimension(-1 * self.width / 2, self.width / 2),
length=self.length,
width=self.width,
)
Looking at this snippet we can see that it goes wrong when self.length and self.width are set to the length and width provided by the provider (0, 100) and (0,140?) respectively for Opta and StatsBomb for x and (0, 100) and (0, 80?) for y.
Somewhere in the SecondSpectrumCoordinateSystem
(or another place entirely) we need to set x_dim, y_dim to (-52.5, 52.5) and (-34, 34)
But, I'm not sure if simply setting it as shown below give the correct/expected behaviour.
@property
def pitch_dimensions(self) -> PitchDimensions:
return PitchDimensions(
x_dim=Dimension(-52.5, 52.5),
y_dim=Dimension(-34.0, 34.0),
)
I'd be curious to hear what your thoughts are about the following implementation:
- We introduce custom_pitch_dimensions as a parameter to every loading function. This will help setting a fixed x_dim, y_dim and the actual pitch length and width.
We do this such that we can for example first load tracking data. Then from that tracking data fetch the actual pitch dimensions (using dataset.metadata.pitch_dimensions
) and insert them into the loading function to align the coordinate systems.
By potentially overwriting the x_dim
and y_dim
parameters we can make sure all data we load can be fixed to the same dimensions.
data = opta.load(
f24_data='eventdetails.xml',
f7_data='matchresults.xml',
coordinates='secondspectrum',
custom_pitch_dimensions=PitchDimensions(
x_dim=Dimension(-52.5, 52.5),
y_dim=Dimension(-34, 34),
length=110,
width=70,
)
)
- We pass the
custom_pitch_dimensions
to the deserializer, in this example theOptaDeserializer
deserializer = OptaDeserializer(
event_types=event_types,
coordinate_system=coordinates,
event_factory=event_factory or get_config("event_factory"),
pitch_dimensions=custom_pitch_dimensions
)
- This
pitch_dimensions
argument comes from the base classEventDataDeserializer
.
Within the EventDataDeserialzer
we set self.pitch_dimensions
as follows:
class EventDataDeserializer(ABC, Generic[T]):
def __init__(
self,
event_types: Optional[List[Union[EventType, str]]] = None,
coordinate_system: Optional[Union[str, Provider]] = None,
event_factory: Optional[EventFactory] = None,
pitch_dimensions: Optional[PitchDimensions] = None
):
....
....
self.pitch_dimensions = pitch_dimensions
- Then, inside
get_transformer()
inside theEventDataDeserializer
we do check if we have setcustom_pitch_dimensions
in the loading function.
If we do, we pass custom_pitch_dimensions
as part of the kwargs
into build_coordinate_system
.
Edit: After reading through this again we don't need to do this check, but we can simply pass self.pitch_dimensions directly the the existing build_coordinate_system()
function, because within SecondSpectrumCoordinateSystem
we do the same check. (see below)
def get_transformer(
self, length: float, width: float
) -> DatasetTransformer:
from_coordinate_system = build_coordinate_system(
self.provider,
length=length,
width=width,
)
if self.pitch_dimensions:
to_coordinate_system = build_coordinate_system(
self.coordinate_system,
length=length,
width=width,
custom_pitch_dimensions=self.pitch_dimensions
)
else:
to_coordinate_system = build_coordinate_system(
self.coordinate_system,
length=length,
width=width,
)
return DatasetTransformer(
from_coordinate_system=from_coordinate_system,
to_coordinate_system=to_coordinate_system,
)
- To accept this
custom_pitch_dimensions
parameter fromkwargs
we need to have it as an option inside the coordinateCoordinateSystem
class. As shown below.
Now, this is a bit ugly, because CoordinateSystem
also has length
and width
which are also inside PitchDimensions
so we might need to clean this up.
@dataclass
class CoordinateSystem(ABC):
normalized: bool
length: float = None
width: float = None
custom_pitch_dimensions: PitchDimensions = None
- Then finally inside each
SomeProviderCoordinateSystem
class we check ifcustom_pitch_dimensions
has already been initialized, and if so use thosePitchDimensions
over the regular option.
@dataclass
class SecondSpectrumCoordinateSystem(CoordinateSystem):
@property
def provider(self) -> Provider:
return Provider.SECONDSPECTRUM
@property
def origin(self) -> Origin:
return Origin.CENTER
@property
def vertical_orientation(self) -> VerticalOrientation:
return VerticalOrientation.BOTTOM_TO_TOP
@property
def pitch_dimensions(self) -> PitchDimensions:
if self.custom_pitch_dimensions is not None: # <-- We do this check here
return self.custom_pitch_dimensions
else:
return PitchDimensions(
x_dim=Dimension(-1 * self.length / 2, self.length / 2),
y_dim=Dimension(-1 * self.width / 2, self.width / 2),
length=self.length,
width=self.width,
)
By doing it like this we can still benefit from passing coordinates="secondspectrum"
in the loading function, and as such use the pitch orientation and origin settings. And all we do is overwrite the domain and pitch lengths/width.
Finally, I think having the option to align the pitch dimensions between tracking and event data will be a first step into introducing a matching algo for Tracking and Event data.
Now, I'm actually thinking of simply provider a whole CustomCoordinateSystem
as an option within coordinates
, as shown below.
Currently the biggest issue here is that there is no transform operation for orientation
only for vertical_orientation
. And, orientation
is not a part of the CoordinateSystems classes. I'll still need to figure that part out...
data = opta.load(
f24_data='eventdetails.xml',
f7_data='matchresults.xml',
coordinates=CustomCoordinateSystem(
normalized=False,
_origin=Origin.CENTER,
_orientation=Orientation.FIXED_HOME_AWAY,
_vertical_orientation=VerticalOrientation.BOTTOM_TO_TOP,
x_dim=Dimension(-52.5, 52.5),
y_dim=Dimension(-34, 34),
length=105,
width=68,
)
)
Note: Ignore the ugly paramaters starting with an underscore, I had to do that because otherwise we'd have conflicts with the CoordinateSystem
default properties
We had to change the load
function to coordinates: Optional[Union[str, CustomCoordinateSystem]] = None
and subsequently do some magic to keep the original 'coordinates' functionality and also be able to pass a custom_coordinates_system
.
Another option would be to simply have custom_coordinate_system
as a parameter in opta.load
but I think this is more user friendly.
if isinstance(coordinates, CustomCoordinateSystem):
custom_coordinate_system = coordinates
coordinates = Provider.CUSTOM
serializer = OptaEventDeserializer(
event_types=event_types,
coordinate_system=coordinates,
event_factory=event_factory or get_config("event_factory"),
custom_coordinate_system=custom_coordinate_system
)
Inside the Deserializer we add the below check to get_transformer()
. Note, we also have to add "CUSTOM" to the Provider class.
if self.coordinate_system == Provider.CUSTOM:
to_coordinate_system = self.custom_coordinate_system
else:
to_coordinate_system = build_coordinate_system(
self.coordinate_system,
length=length,
width=width
)
We create a CustomCoordinateSystem
that has some dataclass
magic to avoid the errors we get when we initialize it with default CoordinateSystem
parameters, like so:
T = TypeVar("T")
def required() -> T:
f: T
def factory() -> T:
field_name = f.name # type: ignore[attr-defined]
raise ValueError(f"field '{field_name}' required")
f = field(default_factory=factory)
return f
@dataclass()
class CustomCoordinateSystem(CoordinateSystem):
_origin: Origin = required()
_orientation: Orientation = required()
_vertical_orientation: VerticalOrientation = required()
_provider: Provider = Provider.CUSTOM
x_dim: Dimension = required()
y_dim: Dimension = required()
normalized: bool = field(default=False)
def __post_init__(self):
self._pitch_dimensions = PitchDimensions(
x_dim=self.x_dim,
y_dim=self.y_dim,
length=self.length,
width=self.width,
)
@property
def provider(self) -> Provider:
return self._provider
@property
def origin(self) -> Origin:
return self._origin
@property
def vertical_orientation(self) -> VerticalOrientation:
return self._vertical_orientation
@property
def pitch_dimensions(self) -> PitchDimensions:
return self._pitch_dimensions