modeling discussion
Closed this issue · 7 comments
We might have discussed this already, but can we use the starting status of the player (bench / starter)
directly as a feature? Or should I make a feature using the previous 5 /10 games' status?
Starting status is okay, as many sites report it before a game so it's reasonable to assume that it would be available. It's a game day specific stat. But good question.
@Zhang-Haipeng The dataset that we have includes starter in the playStat
column and is included as a string either Starter
or Bench
. So we will have to convert that to 0/1 when we wrangle.
Thanks.
Another question, I assume 2012-18_playerBoxScore.csv
is the only CSV we'll actually use right?
I've done one version of wrangling/feature engineering. I'll send a PR so you guys can check on it while I continue with the EDA.
Yes, I think that's the only one we will use. I'll do a quick look through the other datasets to see if there's anything useful.
Ya let's just stick to the 2012-18_playerBoxScore.csv
I'm in the process of getting the download script, to save it in our repo.
Another question.
In the references, what are the features like vegas_diff
, team_top_5_ave
, above_top_5_ave
etc mean? Any way we can include those as well? They seem to have pretty high correlations with the target.
@Zhang-Haipeng We can try to derive team_top_5_ave but not vegas diff, as it is not in our dataset.