Getting and Cleaning Data Course Project
The data is analyzed as per the code provided in the Peer Assessments/Getting and Cleaning Data Course Project
The R code is added as run_analysis.R at the path.
The code is as below,
fileURL <- 'https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip'
#data file dataFileZIP <- "./getdata-projectfiles-UCI-HAR-Dataset.zip"
#working directory dirFile <- "./UCI HAR Dataset"
tidyDataFile <- "./tidyUCIdataset.txt"
tidyDataFileAvgtxt <- "./tidyUCIdatasetAvg.txt"
if (file.exists(dataFileZIP) == FALSE) { download.file(fileURL, destfile = dataFileZIP) }
if (file.exists(dirFile) == FALSE) { unzip(dataFileZIP) }
x_train <- read.table("./UCI HAR Dataset/train/X_train.txt", header = FALSE) X_test <- read.table("./UCI HAR Dataset/test/X_test.txt", header = FALSE) y_train <- read.table("./UCI HAR Dataset/train/y_train.txt", header = FALSE) y_test <- read.table("./UCI HAR Dataset/test/y_test.txt", header = FALSE) subject_train <- read.table("./UCI HAR Dataset/train/subject_train.txt", header = FALSE) subject_test <- read.table("./UCI HAR Dataset/test/subject_test.txt", header = FALSE)
x <- rbind(x_train, X_test) y <- rbind(y_train, y_test) s <- rbind(subject_train, subject_test)
features <- read.table("./UCI HAR Dataset/features.txt")
names(features) <- c('feat_id', 'feat_name')
Search for matches to argument mean or standard deviation (sd) within each element of character vector
index_features <- grep("-mean\(\)|-std\(\)", features$feat_name) x <- x[, index_features]
names(x) <- gsub("\(|\)", "", (features[index_features, 2]))
activities <- read.table("./UCI HAR Dataset/activity_labels.txt")
names(activities) <- c('act_id', 'act_name') y[, 1] = activities[y[, 1], 2] names(y) <- "Activity" names(s) <- "Subject"
tidyDataSet <- cbind(s, y, x)
5. Creates a 2nd, independent tidy data set with the average of each variable for each activity and each subject:
p <- tidyDataSet[, 3:dim(tidyDataSet)[2]] tidyDataAvgSet <- aggregate(p,list(tidyDataSet$Subject, tidyDataSet$Activity), mean)
names(tidyDataAvgSet)[1] <- "Subject" names(tidyDataAvgSet)[2] <- "Activity"# Created csv (tidy data set) in diretory
write.table(tidyDataSet, tidyDataFile)
The steps are done by the code as below,
1.Check if the file exists, otherwise download it; 2.Check if the file has already been extracted to the directory. Else extract it. 3.Load both test and train data 4.Load the features and activity labels 5.Extract the mean and standard deviation for each columns 6.Merge and create the final data set: 7.The result is saved as "./tidyUCIdatasetAvg.txt", a 180x68 data table (181 with column name) and average values.