Work for the course Getting and Cleaning Data, part of the Data Science Specialization.
The script run_analysis.R
prepares the tidy data for later analysis.
In order to run it you must have installed recent versions of the packages
magrittr (at least 1.0.1),
dplyr (at least 0.3.0.2), and
stringr (at least 0.6.2).
The script is divided into three functions that together carry out the steps
of the analysis:
-
retrieve_raw_data
- This function downloads and extracts the zip file containing the raw data. The parameterraw_data_dir
defines the directory where the data will be extracted. Unlessforce
is set toTRUE
, the data will not be downloaded/unzipped again if it already exists on disk. The function returns the time and date when the raw data was downloaded. -
tidy_raw_data
- This function's input is the directory containing the raw data and it returns the tidy data set obtained after performing steps 1, 2, 3, and 4 of the analysis. -
summarize_tidy_data
- This function performs the step 5 of the analysis. Its input is thedata.frame
returned bytidy_raw_data
, and it creates the second tidy data set with the average of each variable for each activity and each subject. This data set can be written to disk usingwrite.table
.
The three functions are called together at the end of run_analysis.R
as follows:
raw_data_dir <- "UCI HAR Dataset"
retrieve_raw_data(raw_data_dir)
tidy_data <- tidy_raw_data(raw_data_dir)
summarized_tidy_data <- summarize_tidy_data(tidy_data)
write.table(summarized_tidy_data, "summarized_tidy_data.txt", row.names = FALSE)
For information about the generated tidy data set, please see the file CodeBook.md.
Yasser Gonzalez
- Homepage - http://yassergonzalez.com
- Email - contact@yassergonzalez.com