After downloading and reading in raw datasets, i use cbind and rbind to combine the following six files.
- train/subject_train.txt
- train/X_train.txt
- train/y_train.txt
- test/subject_test.txt
- test/X_test.txt
- test/y_test.txt
I did not include a seperate column for data source (i.e., train/test), because it can be telled from subject id.
I only keep columns with "mean()" and "std()", that results 33*2=66 colomns.
It can be easily done by replace activity id from file "train/y_train.txt" and "test/y_test.txt" by labels from file "activity_labels.txt".
I think names like "fBodyBodyGyroJerkMag-mean()" is descriptive enough, and also quite long already, and there is not need to expand it. I just remove "-" and "()" and capitalize word "mean" and "std".
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
This can be done with function "summarise_each" from "dplyr" package.