PySport/kloppy

PassType.ASSIST is being used both as a shot and goal assist interchangeably

DriesDeprest opened this issue · 8 comments

In our Opta deserializer, we add the PassType.ASSIST qualifier to a pass with an Opta qualifier type 210, see:
https://github.com/PySport/kloppy/blob/master/kloppy/infra/serializers/event/opta/deserializer.py#L543-L544
qualifier type 210 means a shot assist, as you can see below:
image

In our Statsbomb deserializer, we add the PassType.ASSIST qualifier to a pass with a "goal_assist" tag, see:
https://github.com/PySport/kloppy/blob/master/kloppy/infra/serializers/event/statsbomb/deserializer.py#L291-L293

I would suggest moving from PassType.ASSIST to a PassType.SHOT_ASSISTand PassType.GOAL_ASSIST. A goal assist would have both qualifiers. This way we are explicit about what we mean and we are capable of handling both shot and goal assists.

@koenvo agree?

koenvo commented

Correct me if I'm wrong but this information doesn't need to be loaded from the data itself but can be derived?

pseudo code:

if event.event_type == EventType.PASS:
  next_event = event.next()
  if next_event.event_type == EventType.SHOT:
    event.qualifiers.append(
        PassQualifier(
            PassType.ASSIST_GOAL if next_event.result == ShotResult.GOAL else PassType.ASSIST_SHOT
        )
    )

This way it works for all vendors. Should this work?

Curious what you think about this @JanVanHaaren

I think your definition of assist being a pass being followed directly by a shot event is too narrow.

If you look at the raw data of vendors (e.g. StatsBomb) you see that in between the pass, which gets annotated as an assist, and the shot, there often are carry or duel events, maybe even others.

Therefore, I would use the qualifiers in the raw data where possible. For vendors which don't support shot or goal assists, you can always still try to calculate it.

I agree that we should make the use of the PassType.ASSIST qualifier consistent across the deserializers for the different data providers. This example nicely illustrates that we need more formal definitions for our events and qualifiers to avoid ambiguity as data providers sometimes use the same terms for different concepts.

I'm also in favor of deriving qualifiers, and in some cases even events, from the raw data as much as possible. I believe that this approach would help us to arrive at more uniform and predictable behavior across our deserializers, especially for rather well-defined concepts such as shot assists or key passes and goal assists. However, in other cases, the qualifiers that data providers add to events might not be reconstitutable from the context.

I'm mostly familiar with the StatsBomb event data. They add a goal_assist field to passes leading to a goal and a shot_assist field to passes leading to a shot that was not a goal. However, a case could be made for assigning both qualifiers to passes that are goal assists. In my opinion, it all comes down to properly defining the qualifiers in the first place.

I'm also in favor of deriving qualifiers, and in some cases even events, from the raw data as much as possible. I believe that this approach would help us to arrive at more uniform and predictable behavior across our deserializers, especially for rather well-defined concepts such as shot assists or key passes and goal assists. However, in other cases, the qualifiers that data providers add to events might not be reconstitutable from the context.

I don't fully see how you could properly derive shot or goal assists from the raw data. As I said, I think Koen's approach

if event.event_type == EventType.PASS:
  next_event = event.next()
  if next_event.event_type == EventType.SHOT:
    event.qualifiers.append(
        PassQualifier(
            PassType.ASSIST_GOAL if next_event.result == ShotResult.GOAL else PassType.ASSIST_SHOT
        )
    )

would result in a too narrow definition of shot and goal assists as between the pass, which gets annotated as an assist, and the shot, there are often carry or duel events, maybe even others.

How would you suggest that we properly and accurately derive shot or goal assists from the raw data?

Automatically deriving assist qualifiers would indeed require slightly more sophisticated business logic, but all required information should be available in the data feeds to implement the most common assist definitions.

Moreover, automatically deriving assist qualifiers would even allow us to support multiple definitions as different leagues use slightly different definitions. For example, the Belgian Pro League uses the following definition to award goal assists to players.

image

I'm still struggling with the following two subjective attributes in the Pro League's definition:

  • "beslissende pass": how do you define this?
  • "De positie waarop de doelpuntenmaker de bal ontvangt is een plaats van waaruit onmiddellijk doelgevaar onwaarschijnlijk is.": what zone in the field do we think this is?

Other aspects I think we should consider when building our own custom definition of an assist:

  • What (if any) is the max duration between the assist pass and the shot?
  • What event types can occur between a pass and a shot, so that the pass is still considered an assist to the shot?

I'd also propose, that I first fix, that we are consistent in our assist definitions for different providers in our current implementation where we use the data providers qualifiers to derive this. This way, we would at least not be using shot and goal assists interchangeably.

I will thus make a PR that supports both SHOT_ASSIST & GOAL_ASSIST.

I've looked up a few definitions from different leagues and it seems that they all have subjective components regarding that the assist must (1) be intentional and (2) have a direct influence on the outcome of a goal scored. Moreover, each league uses a slightly different definition and they are continuously changed.

I believe that most people would want that the assists in kloppy are identical to the official stats. And since I do not believe that you can derive "official assists" automatically from the data, I would rely on the data provider's judgment. Also, I wonder whether data providers (always) use their own definition or adopt the league's official stats in practice. For example, I know that Opta will sometimes correct an own-goal to a goal (or vice-versa) post-game.

I don't know what your use case is, but if you need something that is consistent across data providers, you could add another assist category named KLOPPY_ASSIST (I don't have a good name for it immediately) that is automatically derived. What would also be relevant to add is Opta's "Fantasy Goal Assist" which is a very broad definition of assists that is used in fantasy football and Football Manager and that can be automatically derived.

I'm also in favour of taking over the data provider's judgement to label passes as shot or goal assists (this should be resolved with: #281).
Having a KLOPPY_ASSIST, which is automatically derived from the raw data, is then indeed a good solution for also having a consistent metric to compare between data collected by different providers or from different leagues.