The script takes the following actions and transformations
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
A referrence description for the dataset can be found here: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
Dependecies
Packages required to run the run_analysis.R
- dplyr
- data.table
General Flow
first get_features()
is called to read the column names from the description file. Only the column names including mean and standard deviation are returned.
Then map_labels()
is called and this takes both files from a given directory (eg. X_test.txt
and y_test.txt
).
The files are merged by the column nr (the row number) which i had to create first.
This nr column is dropped by using select from the dplyr pacakge.
Then the filter_features()
function operates on features list, by using select
again.
Note this method declaires it's own rename
method because this way I don't have to manual write the column names.
The effect is the same as writing V1 = "tBodyAcc-mean()-X". Which becmoes very verbose.
At last both train and test are combined in one dataset. And a second dataset is a deep table containing all the averages per
row.names | walking | walking_upstairs | walking_downstairs | sitting | standing | laying |
---|---|---|---|---|---|---|
tBodyAcc-mean()-X | 0.276336875 | 0.262294649 | 0.2881372278 | 0.273059614 | 0.279153494 | 0.268648643 |
tBodyAcc-mean()-Y | -0.017906833 | -0.025923289 | -0.0163119255 | -0.012689573 | -0.016151886 | -0.018317728 |
For now the column names are readable, but when accessed quoting is required:
out$"tBodyAcc-mean()-X"