As part of the Coursera and Johns Hopkins University Data Science specialization - Getting and Cleaning Data course this addresses the requirement to deliver the suggested course project. This repository uses the UCI Human Activity Recognition Using Smartphones data set to retrieve, clean and create a tidy data set which has the characteristics defined in its correpsonding Codebook.
The script does the following:
- Merges the subjects, features and labels for both the training and test sets.
- Filters out features to have only the measurements on the mean and standard deviation for each measurement.
- Replaces activity identifiers with descriptive activity names.
- Updates column names to provide representative names.
- Computes the average of each variable for each activity and each subject.
- Writes the tidy data set without row names (to read make sure you do
d <- read.table("data/tidy.txt", stringsAsFactors = F, header = T)
)
By default, the script reads the extracted data set located within the [data/UCI HAR Dataset](./data/UCI HAR Dataset) directory and performs the corresponding cleaning and creation of the tidy data set. The script automatcially installs any package dependencies and loads them whenever needed.
rscript run_analysis.R
Usage: run_analysis.R [-[-resource|r] <character>] [-[-out|o] <character>] [-[-verbose|v]] [-[-help|h]]
-r|--resource local path or url to the data set (optional)
-o|--out output filename. If empty string "" it prints to console (optional)
-v|--verbose print out verbose information during analysis (optional)
-h|--help this help
Download, extract, perform analysis while providing verbose information and write the tidy data set to ./data.txt
rscript run_analysis.R -v -r "https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip" -o "./data.txt"
Extract, perform analysis and write the tidy data set to the console
rscript run_analysis.R -r "data/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip" -o ""
Perform analysis on local directory and and write the tidy data set to ./data/tidy.txt
rscript run_analysis.R -r data/UCI\ HAR\ Dataset/
- This script has currently only been tested on Mac OS X.