Node scripts to gather and clean data for an article on the rise of double-barrelled last names in US professional sports.
Clone the repo and run npm i
Pulls down html pages of alphabetized WNBA players from Basketball Reference and saves into the output/wnba
folder as names-{letter}.html
Compiles and formats all WNBA player names and saves into output/wnba
as names.csv
Pulls down html pages of alphabetized NBA players from Basketball Reference and saves into the output/nba
folder as names-{letter}.html
Compiles and formats all NBA player names and saves into output/nba
as names.csv
Pulls down html pages of alphabetized NFL players from Football Reference and saves into the output/nfl
folder as names-{letter}.html
Compiles and formats all NFL player names and saves into output/nfl
as names.csv
Pulls down html pages of alphabetized MLB players from Baseball Reference and saves into the output/mlb
folder as names-{letter}.html
Compiles and formats all MLB player names and saves into output/mlb
as names.csv
Pulls down html pages of alphabetized NHL players from Hockey Reference and saves into the output/nhl
folder as names-{letter}.html
Compiles and formats all NHL player names and saves into output/nhl
as names.csv
Compiles previously downloaded MLS player names from output/mls/csvs
and saves into output/nwsl
as names-no-years.csv
Formats all MLS player names and saves into output/mls
as names.csv
Pulls down html pages of alphabetized NWSL players from NWSL and saves into the output/nhl
folder as season-{season}.html
Compiles all NWSL player names and saves into output/nwsl
as names-no-years.csv
Formats all NWSL player names and saves into output/nwsl
as names.csv
Pulls down html pages of alphabetized US congressional members from congress.gov and saves into the output/congress
folder as names-{page}.html
Compiles and formats all congressional names and saves into output/congress
as names.csv
Compiles names from all leagues and saves into output
as:
allCombinedNames.csv
which includes names from CongresshyphensCombinedNames.csv
which includes only last names with hyphenssportsCombinedNames.csv
which includes all sports names without Congress
Korean names, where the last name appears before the first name, were later manually untagged as hyphenated names. Players were grouped into decades by the season in which they played in their first professional game. When seasons spanned multiple years (i.e. 1979-1980), the last year was used as the decade. The reasons for hyphenation were manually researched and added into sportsCombinedNames_withReasons.csv
.