UBC-MDS/NBA-Minutes-Predictor

modeling discussion

Closed this issue · 7 comments

We might have discussed this already, but can we use the starting status of the player (bench / starter) directly as a feature? Or should I make a feature using the previous 5 /10 games' status?

Starting status is okay, as many sites report it before a game so it's reasonable to assume that it would be available. It's a game day specific stat. But good question.

@Zhang-Haipeng The dataset that we have includes starter in the playStat column and is included as a string either Starter or Bench. So we will have to convert that to 0/1 when we wrangle.

Thanks.
Another question, I assume 2012-18_playerBoxScore.csvis the only CSV we'll actually use right?
I've done one version of wrangling/feature engineering. I'll send a PR so you guys can check on it while I continue with the EDA.

Yes, I think that's the only one we will use. I'll do a quick look through the other datasets to see if there's anything useful.

Ya let's just stick to the 2012-18_playerBoxScore.csv I'm in the process of getting the download script, to save it in our repo.

Another question.
In the references, what are the features like vegas_diff, team_top_5_ave, above_top_5_ave etc mean? Any way we can include those as well? They seem to have pretty high correlations with the target.

@Zhang-Haipeng We can try to derive team_top_5_ave but not vegas diff, as it is not in our dataset.