SAS analysis using Pokemon as a dataset
- Source data (Pokemon.csv) is from the pokemonData repo which was a fork from this pokemonData repo which is an extension of this Kaggle dataset
- Cleaned data (pokemon_clean.csv) which is the result of running Clean Data on Pokemon.csv
13: Pokedex Number, Name, Type1, Type2, Total, HP, Attack, Defense, Sp. Attack, Sp. Def, Speed, Generation, Legendary
- Import Data, which needs updated to point to the specific path of Pokemon.csv
- Apparently relative paths aren't really a thing in SAS due to the runtime quirks of SAS Studio
- It puts things in a tmp folder, so you can never be too sure of what the path actually is going to be
- To me, that sounds like the exact use case for relative paths, but the SAS Community forums say that is not the case
- The data does need to be explicitly imported rather than using SAS Studio shortcuts as it was not interpreting Type1 as an 8 length column
- Apparently relative paths aren't really a thing in SAS due to the runtime quirks of SAS Studio
- Clean Data, which looks for invalid data and gets it into a better state for future steps
- Changes for incorrect data:
- Updates all non-applicable Types2 to be empty strings rather than 'NA'
- Updates the values for 25 Pokemon from generations 1-6 that were rebalanced with generation 7
- Updates all instances of é to have the appropriate character (Flabébé is a Pokemon name)
- Attempts to update instances of ♀ and ♂ to include the gender symbols, but those are not supported in SAS Studio's charset
- At least, for the version of SAS Studio I was using
- The results of the above changes can be found in the
proc compare
RTF output
- Removes duplicate entries for each Pokedex number
- This removes all Mega Evolutions and alternate forms
- This step had to be done after
proc compare
, otherwise the change in indexes made that chart useless
- Changes for incorrect data:
- Analyze Data, which runs various types of analysis on the variables
Original workbook can be found at PokemonDataAnalysis.twb but the but experience is better with the bundled version at PokemonDataAnalysis.twbx
Overall, there are 5 unique visualizations with 4 of them also having versions excluding legendary Pokemon.
- Max Total Stats By Type
- Median Total Across Generations
- Pokemon Attack and Defense
- Pokemon Sp. Attack and Sp. Defense
- Pokemon Total Stats
A write up of the above charts can be found in Visualizations.md