Exploration of methods using R for Data Science
In this project, I go through methods described in R for Data Science by Hadley Wickham and Garrett Grolemund, as well as Advanced R by Hadley Wickham.
I will also explore some concepts more deeply as I go through the books.
Steps of data science:
- Data importation: bring data from other places into R or other analytic software
- Tidying: organize the data into a format usable by statistical packages
- Transformation: select and prepare the data needed for the analysis
- Visualisation: explore questions through visualisation
- Modelling: answer questions through modelling
- Communication: communicate the results
Tidying + Transformation = Wrangling
Hypothesis generation vs. Hypothesis confirmation
- Hypothesis generation: explore the data to uncover patterns, ask questions about processes, and explain the patterns
- Hypothesis confirmation: develop a model that can reproduce the patterns in the data and confirm the theory