The script run_analysis.R
performs the 5 steps of Coursera's Getting and Cleaning Data Course Project. The steps were written in the following order:
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Activities labels and features are loaded first, then with regex only the features fields with 'mean' and 'std' in the name are extracted
gsub()
is used to replace '-mean' and '-std' for a more readable name
- Merges the training and the test sets to create one data set.
- After load the train and test datasets, they are merged in one whole dataset with the desired features (Mean and Std) with the respective subject and activity.
- Appropriately labels the data set with descriptive variable names.
- After merge the dataset, labels with the respectives activities are applied to the dataset.
- Uses descriptive activity names to name the activities in the data set
- Apply the features descriptive names using
colnames()
function.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
- Using the function
aggregate
, a tidy dataset is written with the average of each variable (feature) with its pair of subject and activity
labels
: dataset of activity labelsfeatures
: dataset of featuresfeaturesMeanAndStdList
: list of id from features which has 'mean' and 'std' in the nametrainSet
: dataset of train set filtered by the desired featurestrainLabels
: label values from the train settrainSubjects
: train subjectstrain
: dataset with subjects, activities and train settestSet
: dataset of test set filtered by the desired featurestestLabels
: label values from the test settestSubjects
: test subjectstest
: dataset with subjects, activities and test setdataset
: merged datasettest
: dataset with subjects, activities and test setmeans
: dataset with feature means aggregate by subject and activity