Erroneous action types for 3pt shots
Closed this issue ยท 17 comments
More just for awareness, but some (~20-30) of the records in the "shots" data frame have an action_type of "3pt" when their location puts them well inside the 3pt-line. All instances are missed shots so likely operator error. Just an FYI.
thanks @davescroggs !! i'm finding some other errors too now that i'm starting to analyse the data rather than make tibbles with it... ๐จ
g'day @davescroggs - could you help me with this? i'm trying to figure out how to isolate the records that are erroneous. how would you suggest identifying xy locations that fall within the 3pt shot arc?
Hey @jacquietran I found it by calculating the distance from the centre of the ring and finding any shots that were less than 6.75m and tagged as 3pt. I also plotted them to check they weren't corner 3's too, and from memory none of them were. Code for shot distance below.
shots %>%
mutate(x = x / 100 * 28,
y = y / 100 * 15,
shot_dist = sqrt((x - 1.6)^2 + (y - 7.5)^2 ))
Kia ora @davescroggs - ahh I see! Thanks for the help! I'll tidy this up today ๐ธ
Taking a closer look using data from games played before today 2021-12-19 (i.e., including games from the 2021 season that is in progress).
# Load libraries
library(tidyverse)
library(ggplot2)
library(ggforce)
#
shots %>%
mutate(
x_for_halfcourt_plot = if_else(x > 50, 100 - x, x),
x_adj_halfcourt = x_for_halfcourt_plot / 100 * 28,
y_adj = y / 100 * 15,
shot_dist = sqrt((x_adj_halfcourt - 1.6)^2 + (y_adj - 7.5)^2 )) %>%
filter(action_type == "3pt") %>%
filter(shot_dist < 6.75) %>%
mutate(
# Categorisation allowing for small degree of human error when coding 3-pt attempt locations
shot_category = case_when(
# Recorded location is erroneous when shot distance < 6 m from ring
shot_dist < 6 ~ "erroneous",
# Recorded location is accepted when shot distance >= 6 m from ring
shot_dist >= 6 ~ "accepted")) %>%
arrange(
desc(shot_dist)
) -> short_3pt_shots
# Set court features
add_forward_goal_circle <- function() geom_arc_bar(
aes(x0 = 50, y0 = 200, r0 = 0, r = 4.9/0.1525,
start = pi / 2, end = 3 / 2 * pi),
inherit.aes = FALSE)
add_key <- function() geom_rect(
xmin = 0, xmax = 5.8, ymin = 9.95, ymax = 5.05, col = "black", fill = NA)
add_3pt_arc <- function() geom_arc(
aes(x0 = 1.575, y0 = 7.5, r = 6.75,
start = pi - 0.1862454, end = 0.1862454),
col = "black", inherit.aes = FALSE)
add_halfcourt <- function() geom_rect(
xmin = 0, xmax = 14, ymin = 0, ymax = 15, col = "black",
fill = NA, inherit.aes = FALSE)
add_top_key <- function() geom_arc(
aes(x0 = 5.8, y0 = 7.5, r = 1.8, start = pi, end = 0),
col = "black", inherit.aes = FALSE)
add_backboard <- function() geom_segment(
aes(x = 1.2, xend = 1.2, y = 6.6, yend = 8.4), col = "black")
add_basket <- function() geom_circle(
aes(x0 = 1.675, y0 = 7.5, r = 0.45/2), col = "black", fill = NA,
inherit.aes = FALSE)
add_centre_circle <- function() geom_arc(
aes(x0 = 14, y0 = 7.5, r = 1.8, start = pi, end = 2*pi),
col = "black", inherit.aes = FALSE)
add_3ball_segment1 <- function() geom_segment(
aes(x = 0, xend = 2.825, y = 0.9, yend = 0.9), inherit.aes = FALSE)
add_3ball_segment2 <- function() geom_segment(
aes(x = 0, xend = 2.825, y = 14.1, yend = 14.1), inherit.aes = FALSE)
# Plot
p <- ggplot(
short_3pt_shots, aes(x = x_adj_halfcourt, y = y_adj))
p <- p + add_key()
p <- p + add_top_key()
p <- p + add_halfcourt()
p <- p + add_backboard()
p <- p + add_basket()
p <- p + add_3ball_segment1()
p <- p + add_3ball_segment2()
p <- p + add_3pt_arc()
p <- p + add_centre_circle()
p <- p + geom_point(
aes(colour = shot_category),
stat = "identity", size = 3, alpha = 0.5)
p <- p + scale_colour_manual(
values = c(
"accepted" = "dodgerblue",
"erroneous" = "red"))
p <- p + coord_fixed(xlim = c(0,14), ylim = c(0,15))
p <- p + facet_wrap(~season, nrow = 2)
p <- p + theme_void()
Observations
- Leave the 2016 data as is.
- Check for available game vision corresponding to the games with erroneous shot locations and recode, either by:
- Recoding the XY location of the shot (and retaining
action_type == "3pt"
) - Recoding the action_type to
"2pt"
(and retaining the XY location as recorded)
- Recoding the XY location of the shot (and retaining
- Where there is no game video that can be used for cross-checking, and the data is inconclusive, then leave it as is.
- There is erroneous shot location data recorded as recently as the current season (which is only in Round 3). Maybe need to set up some data quality monitoring to detect errors in future?
Looking only at the 3-pt shot locations I have categorised as "erroneous" (allowing for some human error in specifying XY location)
short_3pt_shots_erroneous <- short_3pt_shots %>%
filter(shot_category == "erroneous")
# Plot
p <- ggplot(
short_3pt_shots_erroneous, aes(x = x_adj_halfcourt, y = y_adj))
p <- p + add_key()
p <- p + add_top_key()
p <- p + add_halfcourt()
p <- p + add_backboard()
p <- p + add_basket()
p <- p + add_3ball_segment1()
p <- p + add_3ball_segment2()
p <- p + add_3pt_arc()
p <- p + add_centre_circle()
p <- p + geom_point(
colour = "red", stat = "identity", size = 3, alpha = 0.5)
p <- p + coord_fixed(xlim = c(0,14), ylim = c(0,15))
p <- p + facet_wrap(~season, nrow = 2)
p <- p + theme_void()
short_3pt_shots_erroneous %>%
select(
season, page_id, action_number, team_name, team_name_opp,
shot_result, x, y, x_adj_halfcourt, y_adj, shot_dist, scoreboard_name) %>%
arrange(desc(season), desc(shot_dist)) %>%
gt::gt()
WNBL: 2021 / 2022
page_id = 2061074, action_number = 631
- link to live stats: page_id = 2061074
- action_number: 631
- originally recorded as: 3-pt shot missed by Lauren Scherf
- actual shot: 2-pt shot missed by Lauren Scherf
- corroborated by: https://www.youtube.com/watch?v=schN1XVwvsQ
page_id = 2061074, action_number = 642
- link to live stats: page_id = 2061074
- action_number: 642
- originally recorded as: 3-pt shot missed by Brittany Sykes
- actual shot: 2-pt shot missed by Brittany Sykes
- corroborated by: https://www.youtube.com/watch?v=schN1XVwvsQ
page_id = 1997422
- link to live stats: page_id = 1997422
- action_number: 635
- originally recorded as: 3-pt shot missed by Sara Blicavs
- actual shot: 2-pt shot missed by Sara Blicavs
- corroborated by: video - https://youtu.be/KV_2nJLOWqM?t=4000
data tidying to do for all of the above:
- in
shots
andpbp
data, update shot type to "2pt" - in
shots
, keep XY location as is - in
box_scores
andbox_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %
Note: In my checking process, I looked through the pbp
data as well and there may be some issues potentially with the action numbers - they might not always be in chronological order??? Will investigate further and open a separate issue about this if needed.
2020
page_id = 1777499
- link to live stats: page_id = 1777499
- action_number: 151
- originally recorded as: 3-pt shot missed by Stella Beck
- actual shot: not verified
- no data tidying to do at this stage; shot location is at the top of the key so it could have been a 3-pt attempt with location incorrectly recorded, but not possible to know without more info / game video
Couldn't find video of the full game (maybe accessible via Kayo as VOD, but I don't have a subscription). Videos online (on Youtube and Facebook) are highlight reels only and do not include this particular shot, taken at the end of the 1st quarter.
2019
page_id = 1330488
- link to live stats: page_id = 1330488
- action_number: 260
- originally recorded as: 3-pt shot missed by Mercedes Russell
- actual shot: not verified
- data tidying to do, given XY location is very close to the ring and Mercedes Russell is a Centre who does not shoot 3s (as per her WNBA stats):
- in
shots
andpbp
data, update shot type to "2pt" - in
shots
, keep XY location as is - in
box_scores
andbox_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %
- in
Couldn't find video of the full game (maybe accessible via Kayo as VOD, but I don't have a subscription).
2018
page_id = 1087577
- link to live stats: page_id = 1087577
- action_number: 277
- originally recorded as: 3-pt shot missed by Lauren Nicholson
- actual shot: not verified
- data tidying to do, given XY location is inside the key:
- in
shots
andpbp
data, update shot type to "2pt" - in
shots
, keep XY location as is - in
box_scores
andbox_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %
- in
Couldn't find video of the full game.
2017
page_id == 681961
- link to live stats: page_id = 681961
- action_number: 193
- originally recorded as: 3-pt shot missed by Vanessa Panousis
- actual shot: not verified
- no data tidying to do at this stage: shot location is at the elbow; the marker is well inside the 3-pt line but Panousis does shoot 3s so more info / game video is required to verify shot type.
page_id == 681927
- link to live stats: page_id = 681927
- action_number: 262
- originally recorded as: 3-pt shot missed by Mikhaela Donnelly
- actual shot: not verified - only relevant video i could find is this highlights reel which does not include this particular shot
- no data tidying to do at this stage: shot location marker is well inside the 3-pt line but Donnelly does shoot 3s so more info / game video is required to verify shot type.
Very short shot distances
There are a set of distances recorded with action_type == "3pt"
but are quite close to the ring. All of these shots have an XY location that is < 2 m from the ring:
- page_id == 681947, action_number == 61 (M. Bass)
- page_id == 681956, action_number == 235 (S. Greaves)
- page_id == 681958, action_number == 107 (T. Roberts)
- page_id == 681958, action_number == 188 (K. Pedersen)
- page_id == 681958, action_number == 319 (C. Williams)
- page_id == 681960, action_number == 79 (A. Wehrung)
- page_id == 681968, action_number == 397 (A. Kunek)
- page_id == 681974, action_number == 306 (L. Scherf)
- page_id == 681974, action_number == 359 (D. Garbin)
- page_id == 681977, action_number == 307 (A. Taylor)
For each of the above, data tidying to do:
- in
shots
andpbp
data, update shot type to "2pt" - in
shots
, keep XY location as is - in
box_scores
andbox_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %
2015
page_id == 137266
- link to live stats: page_id = 137266
- action_number: 208
- originally recorded as: 3-pt shot missed by Rachel Jarry
- actual shot: not verified
- no data tidying to do at this stage: shot location marker is well inside the 3-pt line but Jarry does shoot 3s so more info / game video is required to verify shot type.
page_id == 137248
- link to live stats: page_id = 137248
- action_number: 471
- originally recorded as: 3-pt shot missed by Louella Tomlinson
- actual shot: not verified
- no data tidying to do at this stage: shot location marker is well inside the 3-pt line but Tomlinson does shoot 3s so more info / game video is required to verify shot type.
Very short shot distances
These shots have an XY location that is < 2 m from the ring:
- page_id == 137261, action_number == 394 (A. Bishop)
- page_id == 137309, action_number == 398 (K. Ebzery)
For each of the above, data tidying to do:
- in
shots
andpbp
data, update shot type to "2pt" - in
shots
, keep XY location as is - in
box_scores
andbox_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %
2014
page_id = 64536
- link to live stats: page_id = 64536
- action_number: 460
- originally recorded as: 3-pt shot missed by Maddie Garrick
- actual shot: not verified
- no data tidying to do at this stage: shot location marker is well inside the 3-pt line but Garrick does shoot 3s so more info / game video is required to verify shot type.
Short shot distances
These shots have an XY location that is inside the key:
- page_id == 64580, action_number == 219 (S. Greaves)
- page_id == 64597, action_number == 454 (S. Batkovic)
For each of the above, data tidying to do:
- in
shots
andpbp
data, update shot type to "2pt" - in
shots
, keep XY location as is - in
box_scores
andbox_scores_detailed
, adjust two pointers attempted and % and three pointers attempted and %
summarising the checks above:
or simpler, focusing on shots to be updated only:
short_3pt_shots_erroneous %>%
select(
season, page_id, action_number, team_name, team_name_opp, shot_dist,
scoreboard_name) %>%
arrange(desc(season), desc(shot_dist)) %>%
mutate(
to_be_updated = case_when(
page_id %in% c(
1997422,
1330488,
1087577,
681947,
681956,
681960,
681968,
681977,
137261,
137309,
64580,
64597) ~ TRUE,
page_id == 681958 &
action_number == 107 ~ TRUE,
page_id == 681958 &
action_number == 188 ~ TRUE,
page_id == 681958 &
action_number == 319 ~ TRUE,
page_id == 681974 &
action_number == 306 ~ TRUE,
page_id == 681974 &
action_number == 359 ~ TRUE,
TRUE ~ FALSE)) %>%
filter(to_be_updated == TRUE) %>%
gt::gt()
For each of the shots listed in the screenshot immediately above, data tidying to do:
- in shots and pbp data, update shot type to "2pt"
- in shots, keep XY location as is
- in box_scores and box_scores_detailed, adjust two pointers attempted and % and three pointers attempted and %
^ oops, that last commit message was meant to read "in 2017 games"
This will be ready to close when the beta branch of {wnblr} is merged.