Course: Getting and Cleaning Data
Submission Date: 17.09.2018
Prepared by: Oleksandr Titorchuk
This file provides details on how the original data was processed, and how the script which handles it works.
The script is divided into 5 blocks according to the exercise description.
The proper work of this script requires 'reshape2' package to be downloaded.
Preparatory work includes checking if the directory with original dataset exists ("UCI HAR Dataset"),and if not - downloading it, and also reading the neccesary data into R.
"xTrain"/"xTest" datasets are merged together through the rbind() function. Column names are assigned to the resulting table from the "features" file.
grepl () function is used to create the logical vector, indicating which variables' names contain 'mean()' and 'std()'. This logical vector let us to subset only the needed columns from the 'X' dataset.
The next step is to merge "activity" and "subject" datasets, which will represent the first 2 columns of the final dataset, and also replace the id numbers in 'activity' column with activity names.
gsub() function is used to rename variables' names of the 'X' data.frame to make them more descriptive.
5. Creates a second, independent tidy data set with the average of each variable for each activity and each subject
The 'reshape2' package let us easily melt and cast the data by 'subject' and 'activity' applying the mean() function. After that we need to rename the columns of the tidy dataset to reflect that they are averaged now ('.averaged'). And finnaly save the tidy dataset into "tidy.data.txt" file.