modeling discussion

Question

modeling discussion

Closed this issue 5 years ago · 7 comments

We might have discussed this already, but can we use the starting status of the player (bench / starter) directly as a feature? Or should I make a feature using the previous 5 /10 games' status?

Answer 1 · 2020-01-19T16:45:51.000Z

Starting status is okay, as many sites report it before a game so it's reasonable to assume that it would be available. It's a game day specific stat. But good question.

Answer 2 · 2020-01-19T17:01:31.000Z

@Zhang-Haipeng The dataset that we have includes starter in the playStat column and is included as a string either Starter or Bench. So we will have to convert that to 0/1 when we wrangle.

Answer 3 · 2020-01-19T17:06:26.000Z

Thanks.
Another question, I assume 2012-18_playerBoxScore.csvis the only CSV we'll actually use right?
I've done one version of wrangling/feature engineering. I'll send a PR so you guys can check on it while I continue with the EDA.

Answer 4 · 2020-01-19T17:07:13.000Z

Yes, I think that's the only one we will use. I'll do a quick look through the other datasets to see if there's anything useful.

Answer 5 · 2020-01-19T17:09:16.000Z

Ya let's just stick to the 2012-18_playerBoxScore.csv I'm in the process of getting the download script, to save it in our repo.

Answer 6 · 2020-01-19T20:32:40.000Z

Another question.
In the references, what are the features like vegas_diff, team_top_5_ave, above_top_5_ave etc mean? Any way we can include those as well? They seem to have pretty high correlations with the target.

Answer 7 · 2020-01-19T23:10:26.000Z

@Zhang-Haipeng We can try to derive team_top_5_ave but not vegas diff, as it is not in our dataset.