UCI HAR database clean-up/summary
Files contained include
- this readme
- CodeBook.md
- run_analysis.R
- UCI HAR dataset
- tidy_data.txt
The repo is used to create a summary tidy data set for the test and training data from the UCI HAR dataset (http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones)
- R installed
- this repo downloaded
- run the run_analysis.r script file
- The script will load the training dataset adding descriptive column names from the features.txt file
- It will then load the subjects and the activities
- The process will repeat for the test data set.
- The datasets will be merged together by appending the rows together.
- The columns which contain mean and std values will be selected
- The resulting dataset will have descriptive activity names applied using the activity_labels.txt file
- Finally, a summary dataset will be created by grouping the dataset by subject and activity. Each group will then be processed by computing the mean value for each variable.
The resulting dataset contains one row per subject and activity and the variables are the mean of the observerations for that variable for the subject/activity pairing. For example, if Subject 1 had 10 rows for walking then there will be 1 row for subject 1 walking and each variable is the mean of the 10 individual rows.