Basketball Reference is a great site (especially for a basketball stats nut like me), and hopefully they don't get too pissed off at me for creating this.
Basically, I created this repository as a utility for another project where I'm trying to estimate an NBA player's productivity as it relates to daily fantasy sports. For that project, I need box score and scheduling information, which is provided by this utility.
Here's the PyPi package.
I wrote this library as an exercise for creating my first PyPi
package.
Hopefully this means that if you'd like to use this library, you can by simply downloading the package via pip like so
pip install basketball_reference_web_scraper
This library requires Python 3.4+
and only supports seasons after the 1999-2000
season
You can import the client
like this
# This imports the client
from basketball_reference_web_scraper import client
There are also a couple useful enum
s that are defined in the data
module which can be import
ed like
# This imports the Team enum
from basketball_reference_web_scraper.data import Team
This client has eight methods
- Getting player box scores by a date (
client.player_box_scores
) - Getting team box scores by a date (
client.team_box_scores
) - Getting the schedule for a season (
client.season_schedule
) - Getting players totals for a season (
client.players_season_totals
) - Getting players advanced season statistics for a season (
client.players_advanced_season_totals
) - Getting regular season box scores for a given player and season (
client.regular_season_player_box_scores
) - Getting the salaries of players of a team for a season (
client.team_salaries
) - Searching (
client.search
)
You can see all methods used in this repl
https://repl.it/@jaebradley/v300api-examples).
This client also supports three output types:
- Python data types (i.e. a
list
or results) JSON
CSV
Versions >=3
of this client outputs CSV
to a specified file path and returns JSON
output or writes it to a specified file path.
- Specify an output type by setting the
output_type
value toOutputType.JSON
orOutputType.CSV
- The default return value of client methods are
Python
data structures (thebox_scores
method returns alist
ofdict
s)
- The default return value of client methods are
- If you'd like the output to be outputted to a specific file, set the
output_file_path
variable - forCSV
output, this variable must be defined - Specifying an
output_write_option
specifies how the output will be written to the specified file (OutputWriteOption.WRITE
corresponds tow
)- The default write option is
OutputWriteOption.WRITE
- The default write option is
- Some pieces of data, like a player's team or the outcome of a game are parsed into enums (for example, the
Team
andOutcome
enums, respectively, for the previous two examples) - These enums are serialized to strings when outputting to
JSON
orCSV
, but when dealing withPython
data structures, you'll see these enum values.- Hopefully, these enums make it easier for the
client
user to implement team-specific logic, for example.
- Hopefully, these enums make it easier for the
from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import OutputType
# Get all player box scores for January 1st, 2017
client.player_box_scores(day=1, month=1, year=2017)
# Get all player box scores for January 1st, 2017 in JSON format
client.player_box_scores(day=1, month=1, year=2017, output_type=OutputType.JSON)
# Output all player box scores for January 1st, 2017 in JSON format to 1_1_2017_box_scores.json
client.player_box_scores(day=1, month=1, year=2017, output_type=OutputType.JSON, output_file_path="./1_1_2017_box_scores.json")
# Output all player box scores for January 1st, 2017 in JSON format to 1_1_2017_box_scores.csv
client.player_box_scores(day=1, month=1, year=2017, output_type=OutputType.CSV, output_file_path="./1_1_2017_box_scores.csv")
from basketball_reference_web_scraper import client
# Get all team totals for January 1st, 2018
client.team_box_scores(day=1, month=1, year=2018)
# The team_box_scores method also supports all output behavior previously described
from basketball_reference_web_scraper import client
# Get all games for the 2017-2018 season
client.season_schedule(season_end_year=2018)
# The schedule method also supports all output behavior previously described
from basketball_reference_web_scraper import client
# Get 2017-2018 season totals for all players
client.players_season_totals(season_end_year=2018)
# The players_season_totals method also supports all output behavior previously described
from basketball_reference_web_scraper import client
# Get 2017-2018 advanced season statistics for all players
client.players_advanced_season_totals(season_end_year=2018)
# Get 2017-2018 advanced season statistics for all players and include advanced statistics for a player
# accumulated over entire course of the season
client.players_advanced_season_totals(season_end_year=2018, include_combined_values=True)
# The players_advanced_season_totals method also supports all output behavior previously described
The structure of the API is due to the unique URL pattern that Basketball Reference has for getting play-by-play data, which depends on the date of the game and the home team.
Example: https://www.basketball-reference.com/boxscores/pbp/201810160BOS.html
from basketball_reference_web_scraper import client
from basketball_reference_web_scraper.data import Team
# Get play-by-play data for 2018-10-16 game played at the Boston Celtics
play_by_play = client.play_by_play(
home_team=Team.BOSTON_CELTICS,
year=2018,
month=10,
day=16,
)
from basketball_reference_web_scraper import client
# Get regular season box scores for Russell Westbrook for the 2018-2019 season
client.regular_season_player_box_scores(
player_identifier="westbru01",
season_end_year=2019,
)
# The regular_season_player_box_score method supports all output behavior previously described
The player_identifier
is Basketball Reference's unique identifier for each player. In the case of Russell Westbrook,
his player_identifier
is westbru01
(you can see this from his player page URL:
https://www.basketball-reference.com/players/w/westbru01/gamelog/2020
)
from basketball_reference_web_scraper import client
from basketball_reference.data import Team
# Get salaries of all the players on the 1997-1998 Bulls team
client.team_salaries(
team=Team.CHICAGO_BULLS,
1998
)
# The team_salaries method supports all output behavior previously described
from basketball_reference_web_scraper import client
# Get all results that match "Ko"
client.search(term="Ko")
# The search method supports all output behavior previously described
There are currently two supported major versions - V3
and V4
.
There are two branches, v3
and v4
for both of these major versions - these are the defacto "master" branches to use
when making changes.
master
will reflect the latest major version branch.
Thanks to @DaiJunyan, @ecallahan5, @Yotamho, and @ntsirakis for their contributions!