jldbc/pybaseball

pitching_stats(year) not working for some seasons

pyradical opened this issue · 2 comments

I'll get an empty dataframe returned for 2013, 2014, and 2016 (print player_series). I'm relatively new to baseball and pybaseball, so I double checked that he played in those years and he most certainly did.

`gerrit_cole = pb.playerid_lookup('cole', 'gerrit')
gerrit_fg = gerrit_cole.iloc[0]['key_fangraphs']

first_season = int(gerrit_cole.iloc[0]['mlb_played_first'])
last_season = int(gerrit_cole.iloc[0]['mlb_played_last']) + 1

season_range = range(first_season, last_season)

columns = ['ERA-']
cole_df = pd.DataFrame(columns= columns)
for year in season_range:
print(year)
season_df = pb.pitching_stats(year)
player_series = season_df[season_df['Name'] == 'Gerrit Cole']
print(player_series)
`

Is there something I'm doing incorrectly here?

By default, pitching_stats will apply the FanGraphs requirement for "qualified" pitchers, 1 IP per team game played. Gerrit Cole didn't have the innings pitched to meet that threshold in 2013, 2014, or 2016. You can set your own IP threshold with the argument qual=<your threshold here>, though this can slow things down a lot.

There's also a completely undocumented argument to only look up data for particular players. So, for example
pb.pitching_stats(start_season=1900, end_season=2024, split_seasons=True, players="13125,19361", qual=0)
Gives every season of either Gerrit Cole (13125) or Corbin Burnes (19361)

There's also a completely undocumented argument to only look up data for particular players. So, for example pb.pitching_stats(start_season=1900, end_season=2024, split_seasons=True, players="13125,19361", qual=0) Gives every season of either Gerrit Cole (13125) or Corbin Burnes (19361)

Thank you...so much for this!