Web Scraping Premier League Statistics

Goal:

Webscrape English Premier League player statistics from 2006-2018

I created the data set I used entirely by webscrapping. I encapsulated all methods and properties in the PlayerScrapper class.

The pipeline occurs in the following fashion:

This step provided a player list and URLS for each of the twenty clubs competing in the EPL for that particular year
Created a Pandas Dataframe with Name, Year, Position, and Nationality and wrote it to a CSV file
1. Get Club HTMLs
  - Used Selenium with Chromedriver because the dropdown bar would not update with specific URL.
  - Saved HTMLs to "data/epl/epl_clubs/year/clubs/club"
  - Example of webpage: Club
2. Parse Club HTMLs
  - Used BeautifulSoup to extract player/url key/value pairs from local HTML file
  - Saved this information as a dictionary in a class variable to be accessed later
  - Constructed Pandas Dataframe with information below and wrote it to a CSV file
    - Name, Year, Club, Position, Nationality

For a particular year, I now had two Pandas Dataframes that needed to be merged.
1. Club level dataframe with 4 columns
2. Player level dataframe with 58 columns
  - Merged on Name, Year, Position, Nationality

Iterate Steps 1-4 from 2006 to 2018 concatenating each resulting dataframe
- This is the annual range that had consistent statistics fields for players

Resulting dataframe: 7473 rows x 59 columns
- Using only a subset of the dataframe where player made an appearance that year: 4750 rows x 59 columns
Columns:
1. Global:
  - 'Name', 'Year', 'Club', 'Position', 'Appearances', 'Wins', 'Losses', 'Nationality'
2. Attack
  - 'Goals', 'Headed Goals', 'Right Footed Goals', 'Left Footed Goals', 'Hit Woodwork', 'Goals per Match', 'Penalties Scored', 'Freekicks Scored', 'Shots', 'Shots on Target', 'Shooting Accuracy', 'Big Chances Missed'
3. Defence
  - 'Tackles', 'Blocked Shots', 'Interceptions', 'Clearances', 'Headed Clearances', 'Tackle Success', 'Recoveries', 'Duels Won', 'Duels Lost', 'Successful 50/50s', 'Aerials Battles Won', 'Aerial Battles Lost', 'Clean Sheets', 'Goals Conceded', 'Own Goals', 'Errors Lead to a Goal', 'Last Man Tackles', 'Clearances Off the Line'
4. Team Play
  - 'Assists', 'Passes', 'Passes per Game', 'Big Chances', 'Crosses', 'Cross Accuracy', 'Through Balls', 'Accurate Long Balls'
5. Discipline
  - 'Yellows', 'Reds', 'Fouls', 'Offsides'
6. Goalkeeping
  - 'Goalie Goals', 'Saves', 'Penalties Saved', 'Punches', 'High claims', 'Catches', 'Sweeper Clearances', 'Throw Outs', 'Goal Kicks'