the folders is uploaded for the course project assignment of "Getting and Cleaning Data" course on coursera.com
the run_analysis.R file contains 3 functions: run_analysis, mergedata and casting the mean_output.txt is the output of the run_analysis function the code book.md indicates the variables in the mean_output.txt
the main function is the run_analysis functino as demanded in the project instruction. overall, the function including 4 parts.
In this assignment, "plyr" and "reshape2" packages will be used
The idea is to build another function called "mergedata" to merge train and test datasets as there are 3 datasets needed to be merged. Thus the datasets are firstly connected by file() function (6 files in total), and passed to mergedata() function. Close() function is used to close the connections in the end of this part.
The variables' name in "merge.x" and the activity labels in "merge.y" are described as text files in the original datasets (features.txt and activity_labels.txt). So the idea is to read them into data frames in the first place for further usage. Two data frames are created to store the variables' name (feature.df) and activity labels (actlabel.df).
In this part, the cols' name of the merged datasets are assigned for further demand: merge.x - col.names which is subsetted from feature.df merge.y - "activity" merged.subject - subjects
A new data frame called sub.df is subsetted from merge.x where the col names contains "mean" or "std" - as instructed in the assignment - through grep() function. The activity code are replaced by descriptive labels according to actlabel.df through a for loop. By the end of this part a subsetted data frame is prepared for next step.
The sub.df is splitted by subject id through split() function and passed to the casting() function to calculate the mean of each variable. The mean_output.txt is created through write.table() function to produce the data which is uploaded to coursera.com.
This function is designed to save memories and redundant coding since there are three datasets needed to be merged. The argguments include two file connection and the wanted class of the returned data frame which takes "factor" as default.
This function is designed to melt and cast the splited data to calculate the means of each variable by activity for each subject.