Properly handle team-specific data availability for certain years
Closed this issue · 4 comments
Related to #64 - some teams have data missing for years that other teams do not. For example, most/all top-tier D1 programs have been in D1 for quite some time and thus have the complete dataset from 1999 all the way through present day.
However, more recent D-1 admits will have limited historical availability. See Merrimack, who has data only beginning in 2020.
This also means that any team between 1999 and present who (a) was in D-1, (b) did not participate in D-1 for one or more years, then (c) returned to D-1 may also have some years missing in between.
A simple solution for this is a check to make sure the navigation to a particular team's season page in team.py
is not redirected back to the main page (which seems to be how KenPom is designed to behave when requesting data that doesn't exist; see how https://kenpom.com/team.php?team=Merrimack&y=2019 behaves).
I drew this up quickly in my JavaScript console when I had a second today:
[...document.querySelector('#years-wrapper #years-container').innerText.matchAll(/(?<=\s)\d{2}(?=\s)/g)].map(_ => parseInt(_[0]) > parseInt(new Date().getYear().toString().substring(1)) + 1 ? parseInt('19' + _[0]) : parseInt('20' + _[0]))
When run against Villanova's team page returns:
[1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]
This type of logic might be another way to determine whether a season
parameter value is "valid". I'm going to do more digging to see what the best way might be - after giving it a bit more thought I'm of the belief that simply launching the request and checking if there was a redirect is ultimately more efficient.
giving it a bit more thought I'm of the belief that simply launching the request and checking if there was a redirect is ultimately more efficient.
I would have to agree with this statement. I think preventing a request if the page doesn't exist would be best, but I don't think that is worth it. An incorrect year request is only one page request, it is not like there are multiple requests needed.
However, if we wanted a check before requests we could have a dict (or another structure) that holds all the teams and the years associated with that team. I would still think a redirect check would be simpler and less error prone as the dict would need updating at the bare minimum every year. Probably not worth the overhead either.
Also, I can pick this issue up. I am palnning to work on the redirect unless there is opposition.
As @WakeUpWaffles mentioned in the thread for #75, this is actually already handled with appropriate calls to get_valid_teams()