For this project the following requirements existed:
- Merge the training and test sets to create one data set
- Extract the measurements on the mean and standard deviation
- Use descriptive activity names to name the activities
- Appropriately label the data set with descriptive variable names
- Create a second independent data set with the average of each variable for each activity and each subject
The script run_analysis.R accomplishes these tasks by performing the following:
- Reads in features.txt to get the feature names
- Reads in activity_labels.txt to get the activity names
- Reads the train data
- Reads the subject_train.txt file to get the volunteers part of the training set
- Reads y_train.txt to get the activities performed
- Translates the activity ids to names
- Reads X_train.txt to get feature data and applies names to the columns
- Selects only the mean and std columns
- Reads the test data
- Reads the subject_text.txt file to get the voluneers part of the test set
- Reads the y_test.txt to get the activities performed
- Translates the activity ids to names
- Reads X_test.txt to get the feature data and applies names to the columns
- Selects only the mean and std columns
- Combines the two data sets and writes it to combined_dataset.txt
- Groups and aggregates data based on the volunteer and activity, producing a table of means of all the features for each group
- Writes out the new dataset to tidy_dataset.txt
A codebook is available as tidy_dataset_info.md
- The working directory should contain the R script as well as the original dataset as unzipped.
- The script requires the libraries the following libraries: reshape, reshape2 and plyr
source("run_analysis.R")