-
README.md - This file.
-
run_analysis.R - Loads and processes data from UCI HAR Dataset then exports a summary.
The script run_analysis.R loads the test and training data from the [UCI HAR Dataset](https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles% 2FUCI%20HAR%20Dataset.zip) and produces the output file summary_data.txt.
To run run_analysis.R and create summary_data.txt, the UCI HAR Dataset must first be unpacked so that its top level directory is in the same directory as the run_analysis.R.
The script proceeds as follows:
-
Create tidy matrices based on the data in the "train" and "test" directories using the function load_and_assemble_data. Only a subset of the total data corresponding to the means and standard deviations is included in this file. See codebook section below for details.
-
Combine these two matrices into the full, tidy data set.
-
Use dplyr's group_by and summarise_each to create the summary data. This data contains the averages for each of the observations over groups defined by subject and activity.
-
Make the column headers more descriptive. This involves making the headers match their original forms in "activity_labels.txt" adding the prefix "meanOf-" to each header in order to emphasize that the values presented are group means.
The bulk of the work is done in the function load_and_assemble_data(kind). This function works on either the "test" or "train" data, depending on the value kind. The function:
-
Reads the subject ids.
-
Reads the activity ids and converts them into text values for activities.
-
Reads the measured values and subsets them, keeping only the mean and standard deviation entires.
-
Combine these three parts into one matrix.
For a description of the format of summary_data.txt, please see the file CODEBOOK.md.