The script run_analysis.R
performs the following steps, the scope being a tidy data set, according to the rules suggested in Hadley Wickham's tidy data paper:
- Reads the data sets, along with the dictionaries:
- subject variable:
UCI HAR Dataset/train/subject_train.txt
andUCI HAR Dataset/test/subject_test.txt
- activity variable:
UCI HAR Dataset/train/y_train.txt
andUCI HAR Dataset/test/y_test.txt
- feature measurements:
UCI HAR Dataset/train/X_train.txt
andUCI HAR Dataset/test/X_test.txt
- dictionaries: activities and features:
UCI HAR Dataset/activity_labels.txt
andUCI HAR Dataset/features.txt
- subject variable:
- Merges the training and the test sets to create one data set.
- Merging colwise and rowwise.
- The resulting data frame
bound.all
has a wide format: 10299 observations od 563 variables (1 subject, 1 activity and 561 feature variables). - The data is then converted to a long format
tidy.all
with 4 columns: subject, activity, feature id, and value.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Out of 561 features, only 66 are selected. The filtered
tidy.all
data frame has 679734 rows and 4 columns.
- Out of 561 features, only 66 are selected. The filtered
- Uses descriptive activity names to name the activities in the data set.
- The activities dictionary is used.
- Appropriately labels the data set with descriptive variable names.
- Each feature gets a descriptive label from the features dictionary.
- From the data set in previous step, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- The resulting data frame
tidy.mean
is eventually written to the filetidy_mean.txt
.
- The resulting data frame