a# Yeti-Thesis-Project for my thesis--Yejia Liu
- contains boxscores for each player's first seven years.
- goes from 1998-2011
- total number of players is 3,076.
- Note: there are two "Sean Collins". Only one of them can be found in Kurt's data.
- includes players with 0 NHL games
- stored in ckm_and_exception_mining.SevenSeasons_wilson on cs-oschulte-01
- crawled from web (nhl.com and eliteprospects.com)
- originally contained in nhl_final2.player and nhl_final2.skater_career_stats, now joined and copied to draft_master_table_1
- goes from draft year 1998-2008
- total number of players = 1097 (only players with > 0 NHL games)
- Note: there are two "Alexandre Picard" and two "Mikko Lehtonen". Only one of each can be found in Wilson's data
The join of Wilson + routley is contained in draft_master_table2 (nice to rename like "Wilson_Routley")
- adds central scouting services rank (CSS) for Kurt's 1097 players
- crawled [draftanalyst.com]
- eliminated 2 duplicates, i.e. PlayerId = 8470678 & 8469714
- total number of players 1095
- contained in two views
- ckm_and_exception_mining.NHL_playerStats_1998_2008_view (player features only)
- the above view is join of Wilson + Routley on player name and draft number
- names of 28 players had to be modified in both Wilson & Routley tables to match
- ckm_and_exception_mining.NHL_seasonInfo_1998_2008_view (2772 rows in total)
- ckm_and_exception_mining.NHL_playerStats_1998_2008_view (player features only)
- changed the column names in ckm_and_exception_mining.NHL_seasonInfo_1998_2008_view, such as "CareerGP", "CareerG", etc. into "GP", "G", etc. as these values correspond to a particular seaon type and a particular team for a specific player. With such a name change, a new view is created as
- chao_draft.NHL_seasonInfo_1998_2008_duplicates_view
- deleted columns named "S" and "shotPercentage" as these values are all zeros
- deleted rows with GP =0, G=0, A=0, P=0, etc. for both season types (i.e. Regular Season/Playeroffs)
- deleted rows with false team names for PlayerId = 8467514 & 8470828
- modified rows with duplicate Team names for PlayerId = 8468597
- above changes were made and saved as chao_draft.NHL_seasonInfo_1998_2008 (2644 rows in total)