"Getting And Cleaning Data" Course Project

Coursera "Getting and Cleaning Data" (John Hopkins) course
Course Project
Jon Ide

You should create one R script called run_analysis.R that does the following.

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set
Appropriately labels the data set with descriptive variable names.
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Note: It is assumed that the working directory in which run_analysis.R is run contains subfolders "test" and "train"
Read 'features.txt' file into features data frame
Read 'activity_labels.txt' file into activity_labels data frame
Read the training data

Read the training data 'x_train.txt' file into data frame x_train
Read the training data activity codes 'y_train.txt' file into data frame y_train
Read the training data subject codes 'subject_train.txt' file into data frame subject_train

Read the test data 'x_test.txt' file into data frame x_test
Read the test data activity codes 'y_test.txt' file into data frame y_test
Read the test data subject codes 'subject_test.txt' file into data frame subject_test

Use rbind to combine the training and testing measurement data into data frame x_combined
Find the columns containing "-mean()" or "-std()" and keep just those columns in data frame x_combined_means_sds
Use rbind to combine the training and testing activity codes into data frame y_combined with column name "Activity"
Use rbind to combine the training and testing subjects into data frame subject_combined with column name "Subject"
Replace numeric activity codes in y_combined with their text equivalents
Add a column "SubjAct" to combined_data that contains subject and activity concatenated with separator "-". This will be used by group_by.
Using dplyr, create data frame grouped:

Restore the Subject and Activity columns and remove the SubjAct and Temp columns
Clean up the variable names
Turn Subject into an integer so it sorts correctly and sort the table on Subject using dplyr's arrange
Turn Activity into a factor (not really necessary)
Save the tidy data set in text file 'tidy.txt'

pastpeak/GettingAndCleaningData