/har_data_cleaning

Project for "Getting and Cleaning Data" course on Coursera.

Primary LanguageR

README.md

This git repository has three files:

  1. README.md - this file
  2. run_analysis.r - an R script used to clean and "tidy" [1] raw human activity data
  3. CodeBook.md - a markdown document containing descriptions of the input, transformations, and output data referred to in run_analysis.r

Before running "run_analysis.R", you'll need:

To set up your R environment before running the script, make sure that both the run_analysis.R script and the raw data (in a sub-directory named "UCI HAR Dataset") are both in your working directory. For example:

./ (working directory)
    run_analysis.R
    UCI HAR Dataset/
        test/
        train/
        activity_labels.txt
        ...

Finally, run the R script, which will do the following:

  1. Read in the X (measurement), Y (activity), and subject datasets for both the train and test partitions.
  2. Merge the 6 datasets above into a single dataset
  3. Re-label the activities using meaningful descriptions instead of coded values.
  4. Remove some of the measurement variables (any that aren't the mean or standard deviation).
  5. Add meaningful variable names, based on the "features.txt" metadata included with the original datasets. - Measurement variable names are abbreviated, to avoid being too long. - Some of the variable names appear to have included typos (e.g. "BodyBody") which have been corrected.
  6. Create two datasets in the working directory: 1. har_tidy.txt - the "tidy" dataset, with mean and std. deviations of measurements, along with identifying information about each set of measurements. 2. har_summary.txt - a second "tidy" dataset, this one containing the average measurements by subject and activity.

For more information on the contents of these datasets, see the included CodeBook.md.

Citations:

[1] Wickham, H. Tidy Data. Retrieved October 25, 2015, from http://vita.had.co.nz/papers/tidy-data.pdf

[2] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.