Getting and Cleaning Data - Course Project
This file describes the code in run_analysis.R. See CodeBook.md for more on the data itself.
The structure of the code is as follows.
utilities
data_dir gives the relative location of the directory containing the data. If the supplied zip file is unzipped in the current directory nothing needs changing.
merged_data
This function implements the first step of the project - the data from the training and test data sets are loaded into a data table. No metadata about the source of each observation is retaining as this is not needed for the remainder of the project.
Additional columns for the activities and subject of each observation are added to the data table
extract_features
The removes columns from the supplied dataset that do not include the strings "std" or "mean", the rubric is slightly unclear about exactly which variables should be retained.
rename_activities
This replaces the activities column values with a factor containing more descriptive activity names as supplied with the original data.
rename_variables
This gives the variables more meaningful columns headings - see the file features.txt for the names and the file features_info.txt for information about the meaning of each feature.
do_steps
This performs all the steps described above to create a new data table from the data on disk.
averages
Creates a new data table that averages each variable by subject and activity. Since there are 30 subjects and 6 activities we get a total of 180 rows.
write_td
Creates the tiny_data.txt file, suitable for reading back into R. Note that this is "wide" - we have not melted out the variables, but the rubric allows for either wide or narrow summary data.