jacquietran/wnblr

Erroneous action types for 3pt shots

Closed this issue ยท 17 comments

More just for awareness, but some (~20-30) of the records in the "shots" data frame have an action_type of "3pt" when their location puts them well inside the 3pt-line. All instances are missed shots so likely operator error. Just an FYI.

thanks @davescroggs !! i'm finding some other errors too now that i'm starting to analyse the data rather than make tibbles with it... ๐Ÿšจ

g'day @davescroggs - could you help me with this? i'm trying to figure out how to isolate the records that are erroneous. how would you suggest identifying xy locations that fall within the 3pt shot arc?

Hey @jacquietran I found it by calculating the distance from the centre of the ring and finding any shots that were less than 6.75m and tagged as 3pt. I also plotted them to check they weren't corner 3's too, and from memory none of them were. Code for shot distance below.

shots %>%
mutate(x = x / 100 * 28,
y = y / 100 * 15,
shot_dist = sqrt((x - 1.6)^2 + (y - 7.5)^2 ))

Kia ora @davescroggs - ahh I see! Thanks for the help! I'll tidy this up today ๐Ÿ˜ธ

Taking a closer look using data from games played before today 2021-12-19 (i.e., including games from the 2021 season that is in progress).

# Load libraries
library(tidyverse)
library(ggplot2)
library(ggforce)

# 
shots %>%
	mutate(
		x_for_halfcourt_plot = if_else(x > 50, 100 - x, x),
		x_adj_halfcourt = x_for_halfcourt_plot / 100 * 28,
                y_adj = y / 100 * 15,
		shot_dist = sqrt((x_adj_halfcourt - 1.6)^2 + (y_adj - 7.5)^2 )) %>%
	filter(action_type == "3pt") %>%
	filter(shot_dist < 6.75) %>%
	mutate(
                # Categorisation allowing for small degree of human error when coding 3-pt attempt locations
		shot_category = case_when(
                        # Recorded location is erroneous when shot distance < 6 m from ring
			shot_dist < 6   ~ "erroneous",
                        # Recorded location is accepted when shot distance >= 6 m from ring
			shot_dist >= 6 ~ "accepted")) %>%
	arrange(
		desc(shot_dist)
	) -> short_3pt_shots

# Set court features
add_forward_goal_circle <- function() geom_arc_bar(
	aes(x0 = 50, y0 = 200, r0 = 0, r = 4.9/0.1525,
			start = pi / 2, end = 3 / 2 * pi),
	inherit.aes = FALSE)
add_key <- function() geom_rect(
	xmin = 0, xmax = 5.8, ymin = 9.95, ymax = 5.05, col = "black", fill = NA)
add_3pt_arc <- function() geom_arc(
	aes(x0 = 1.575, y0 = 7.5, r = 6.75,
			start = pi - 0.1862454, end = 0.1862454),
	col = "black", inherit.aes = FALSE)
add_halfcourt <- function() geom_rect(
	xmin = 0, xmax = 14, ymin = 0, ymax = 15, col = "black",
	fill = NA, inherit.aes = FALSE)
add_top_key <- function() geom_arc(
	aes(x0 = 5.8, y0 = 7.5, r = 1.8, start = pi, end = 0),
	col = "black", inherit.aes = FALSE)
add_backboard <- function() geom_segment(
	aes(x = 1.2, xend = 1.2, y = 6.6, yend = 8.4), col = "black")
add_basket <- function() geom_circle(
	aes(x0 = 1.675, y0 = 7.5, r = 0.45/2), col = "black", fill = NA,
	inherit.aes = FALSE)
add_centre_circle <- function() geom_arc(
	aes(x0 = 14, y0 = 7.5, r = 1.8, start = pi, end = 2*pi),
	col = "black", inherit.aes = FALSE)
add_3ball_segment1 <- function() geom_segment(
	aes(x = 0, xend = 2.825, y = 0.9, yend = 0.9), inherit.aes = FALSE)
add_3ball_segment2 <- function() geom_segment(
	aes(x = 0, xend = 2.825, y = 14.1, yend = 14.1), inherit.aes = FALSE)

# Plot
p <- ggplot(
	short_3pt_shots, aes(x = x_adj_halfcourt, y = y_adj))
p <- p + add_key()
p <- p + add_top_key()
p <- p + add_halfcourt()
p <- p + add_backboard()
p <- p + add_basket()
p <- p + add_3ball_segment1()
p <- p + add_3ball_segment2()
p <- p + add_3pt_arc()
p <- p + add_centre_circle()
p <- p + geom_point(
	aes(colour = shot_category),
	stat = "identity", size = 3, alpha = 0.5)
p <- p + scale_colour_manual(
	values = c(
		"accepted" = "dodgerblue",
		"erroneous" = "red"))
p <- p + coord_fixed(xlim = c(0,14), ylim = c(0,15))
p <- p + facet_wrap(~season, nrow = 2)
p <- p + theme_void()

image

Observations

  • Leave the 2016 data as is.
  • Check for available game vision corresponding to the games with erroneous shot locations and recode, either by:
    • Recoding the XY location of the shot (and retaining action_type == "3pt")
    • Recoding the action_type to "2pt" (and retaining the XY location as recorded)
  • Where there is no game video that can be used for cross-checking, and the data is inconclusive, then leave it as is.
  • There is erroneous shot location data recorded as recently as the current season (which is only in Round 3). Maybe need to set up some data quality monitoring to detect errors in future?

Looking only at the 3-pt shot locations I have categorised as "erroneous" (allowing for some human error in specifying XY location)

short_3pt_shots_erroneous <- short_3pt_shots %>%
	filter(shot_category == "erroneous")

# Plot
p <- ggplot(
	short_3pt_shots_erroneous, aes(x = x_adj_halfcourt, y = y_adj))
p <- p + add_key()
p <- p + add_top_key()
p <- p + add_halfcourt()
p <- p + add_backboard()
p <- p + add_basket()
p <- p + add_3ball_segment1()
p <- p + add_3ball_segment2()
p <- p + add_3pt_arc()
p <- p + add_centre_circle()
p <- p + geom_point(
	colour = "red", stat = "identity", size = 3, alpha = 0.5)
p <- p + coord_fixed(xlim = c(0,14), ylim = c(0,15))
p <- p + facet_wrap(~season, nrow = 2)
p <- p + theme_void()

image

short_3pt_shots_erroneous %>%
	select(
		season, page_id, action_number, team_name, team_name_opp,
		shot_result, x, y, x_adj_halfcourt, y_adj, shot_dist, scoreboard_name) %>%
	arrange(desc(season), desc(shot_dist)) %>%
	gt::gt()

image

WNBL: 2021 / 2022

page_id = 2061074, action_number = 631

page_id = 2061074, action_number = 642

page_id = 1997422

data tidying to do for all of the above:

  • in shots and pbp data, update shot type to "2pt"
  • in shots, keep XY location as is
  • in box_scores and box_scores_detailed, adjust two pointers attempted and % and three pointers attempted and %

Note: In my checking process, I looked through the pbp data as well and there may be some issues potentially with the action numbers - they might not always be in chronological order??? Will investigate further and open a separate issue about this if needed.

2020

page_id = 1777499

  • link to live stats: page_id = 1777499
  • action_number: 151
  • originally recorded as: 3-pt shot missed by Stella Beck
  • actual shot: not verified
  • no data tidying to do at this stage; shot location is at the top of the key so it could have been a 3-pt attempt with location incorrectly recorded, but not possible to know without more info / game video

Couldn't find video of the full game (maybe accessible via Kayo as VOD, but I don't have a subscription). Videos online (on Youtube and Facebook) are highlight reels only and do not include this particular shot, taken at the end of the 1st quarter.

2019

page_id = 1330488

  • link to live stats: page_id = 1330488
  • action_number: 260
  • originally recorded as: 3-pt shot missed by Mercedes Russell
  • actual shot: not verified
  • data tidying to do, given XY location is very close to the ring and Mercedes Russell is a Centre who does not shoot 3s (as per her WNBA stats):
    • in shots and pbp data, update shot type to "2pt"
    • in shots, keep XY location as is
    • in box_scores and box_scores_detailed, adjust two pointers attempted and % and three pointers attempted and %

Couldn't find video of the full game (maybe accessible via Kayo as VOD, but I don't have a subscription).

2018

page_id = 1087577

  • link to live stats: page_id = 1087577
  • action_number: 277
  • originally recorded as: 3-pt shot missed by Lauren Nicholson
  • actual shot: not verified
  • data tidying to do, given XY location is inside the key:
    • in shots and pbp data, update shot type to "2pt"
    • in shots, keep XY location as is
    • in box_scores and box_scores_detailed, adjust two pointers attempted and % and three pointers attempted and %

Couldn't find video of the full game.

2017

page_id == 681961

  • link to live stats: page_id = 681961
  • action_number: 193
  • originally recorded as: 3-pt shot missed by Vanessa Panousis
  • actual shot: not verified
  • no data tidying to do at this stage: shot location is at the elbow; the marker is well inside the 3-pt line but Panousis does shoot 3s so more info / game video is required to verify shot type.

page_id == 681927

  • link to live stats: page_id = 681927
  • action_number: 262
  • originally recorded as: 3-pt shot missed by Mikhaela Donnelly
  • actual shot: not verified - only relevant video i could find is this highlights reel which does not include this particular shot
  • no data tidying to do at this stage: shot location marker is well inside the 3-pt line but Donnelly does shoot 3s so more info / game video is required to verify shot type.

Very short shot distances

There are a set of distances recorded with action_type == "3pt" but are quite close to the ring. All of these shots have an XY location that is < 2 m from the ring:

  • page_id == 681947, action_number == 61 (M. Bass)
  • page_id == 681956, action_number == 235 (S. Greaves)
  • page_id == 681958, action_number == 107 (T. Roberts)
  • page_id == 681958, action_number == 188 (K. Pedersen)
  • page_id == 681958, action_number == 319 (C. Williams)
  • page_id == 681960, action_number == 79 (A. Wehrung)
  • page_id == 681968, action_number == 397 (A. Kunek)
  • page_id == 681974, action_number == 306 (L. Scherf)
  • page_id == 681974, action_number == 359 (D. Garbin)
  • page_id == 681977, action_number == 307 (A. Taylor)

For each of the above, data tidying to do:

  • in shots and pbp data, update shot type to "2pt"
  • in shots, keep XY location as is
  • in box_scores and box_scores_detailed, adjust two pointers attempted and % and three pointers attempted and %

2015

page_id == 137266

  • link to live stats: page_id = 137266
  • action_number: 208
  • originally recorded as: 3-pt shot missed by Rachel Jarry
  • actual shot: not verified
  • no data tidying to do at this stage: shot location marker is well inside the 3-pt line but Jarry does shoot 3s so more info / game video is required to verify shot type.

page_id == 137248

  • link to live stats: page_id = 137248
  • action_number: 471
  • originally recorded as: 3-pt shot missed by Louella Tomlinson
  • actual shot: not verified
  • no data tidying to do at this stage: shot location marker is well inside the 3-pt line but Tomlinson does shoot 3s so more info / game video is required to verify shot type.

Very short shot distances

These shots have an XY location that is < 2 m from the ring:

  • page_id == 137261, action_number == 394 (A. Bishop)
  • page_id == 137309, action_number == 398 (K. Ebzery)

For each of the above, data tidying to do:

  • in shots and pbp data, update shot type to "2pt"
  • in shots, keep XY location as is
  • in box_scores and box_scores_detailed, adjust two pointers attempted and % and three pointers attempted and %

2014

page_id = 64536

  • link to live stats: page_id = 64536
  • action_number: 460
  • originally recorded as: 3-pt shot missed by Maddie Garrick
  • actual shot: not verified
  • no data tidying to do at this stage: shot location marker is well inside the 3-pt line but Garrick does shoot 3s so more info / game video is required to verify shot type.

Short shot distances

These shots have an XY location that is inside the key:

  • page_id == 64580, action_number == 219 (S. Greaves)
  • page_id == 64597, action_number == 454 (S. Batkovic)

For each of the above, data tidying to do:

  • in shots and pbp data, update shot type to "2pt"
  • in shots, keep XY location as is
  • in box_scores and box_scores_detailed, adjust two pointers attempted and % and three pointers attempted and %

summarising the checks above:

image

or simpler, focusing on shots to be updated only:

short_3pt_shots_erroneous %>%
	select(
		season, page_id, action_number, team_name, team_name_opp, shot_dist,
		scoreboard_name) %>%
	arrange(desc(season), desc(shot_dist)) %>%
	mutate(
		to_be_updated = case_when(
			page_id %in% c(
				1997422,
				1330488,
				1087577,
				681947,
				681956,
				681960,
				681968,
				681977,
				137261,
				137309,
				64580,
				64597)               ~ TRUE,
			page_id == 681958 &
				action_number == 107 ~ TRUE,
			page_id == 681958 &
				action_number == 188 ~ TRUE,
			page_id == 681958 &
				action_number == 319 ~ TRUE,
			page_id == 681974 &
				action_number == 306 ~ TRUE,
			page_id == 681974 &
				action_number == 359 ~ TRUE,
			TRUE                   ~ FALSE)) %>%
	filter(to_be_updated == TRUE) %>%
	gt::gt()

image

For each of the shots listed in the screenshot immediately above, data tidying to do:

  • in shots and pbp data, update shot type to "2pt"
  • in shots, keep XY location as is
  • in box_scores and box_scores_detailed, adjust two pointers attempted and % and three pointers attempted and %

^ oops, that last commit message was meant to read "in 2017 games"

This will be ready to close when the beta branch of {wnblr} is merged.