`game_id` is a varchar
Closed this issue · 3 comments
Flagging that from the NBA's perspective, game_id
is a 10 digit character vector but all the uncompressed csv files in this repository have converted this variable to a numeric and taken off the first two digits. This does make for a more efficient compression as the first two digits are always "0", but it distorts what a researcher would have received from making calls to these APIs themselves as this variable is now of a different type.
And just for reference the game_id
is the form XXXYYZZZZZ:
- XXX: is the season prefix
- 001 - preseason
- 002 - regular season
- 003 - all-star
- 004 - playoffs
- 005 - play-in
- YY: is two digit start year of the season (2023-2024 would be 23)
- ZZZZZ: is sequenced game number
@atlhawksfanatic, you are right, the GAME_ID has been converted to a number, and the first two zeros have been removed. Initially, the dataset was not an exact copy of the raw data received from the NBA API, it had several transformations. Now I have removed all transformations except GAME_ID. I may make the GAME_ID a string and the data in the dataset will become an exact copy of the NBA API data.
I like the rawness of the data that you're providing. While it's good skill to figure out how to find various NBA API endpoints, determine all the parameters, and construct your own query it is ridiculous to wait hours/days to get all the necessary information for the next steps in their analysis and this repo fills that gap nicely.
You might be fine just noting in documentation that GAME_ID is missing left padded "00".
@atlhawksfanatic good point. I will add information about this in README