Tidy data "Human Activity Recognition Using Smartphones Dataset"

Description of files

  • run_analysis.R reads in the original data set from a subfolder named "UCI HAR Dataset" and creates solution.txt in the working directory. Both input folder and output file location can be changed on top of this script
  • generate_codebook.R is a small script to help creating the code book. Can only be run after run_analysis.R has been run. Creates outline.txt
  • solution.txt is the tidy data set created by run_analysis.R
  • outline.txt is input for the code book; is generated by generate_codebook.R
  • CodeBook.md is the code book
  • README.md is this file

Description of data processing

It is assumed the original data is in the working directory. The input data set is situated in two separate sets, a "training" and a "test" set. These two are first merged. The activity identifiers are translated from identifiers into human-readable names. Then the mean and standard deviation of each measure is extracted; other statistics are discarded. The input data sets contain measurements of subjects performing certain activities. The same subject can perform the same activity multiple times. In the final data set, the average of the mean/standard deviation is reported per subject-activity combination. The final data set can be found in the solution.txt file, which can be read into R with read.table("solution.txt", header = TRUE)