Getting and Cleaning Data -- Course Project

This repository contains the processing code for producing tidy datasets from the raw dataset at http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

In order to produce the tidy datasets, I do the following:

Read the measurement features from the file features.txt, recognizing the ordered index and name of each measurement
Subset the recognized measurements to those whose names contain either the text mean or the text std which signify that they are either an arithmetic mean or standard deviation of the corresponding series.
Read the descriptive textual activity labels from the file activity_labels.txt
For the test dataset, read in the subject column data for all the test observations from the file subject_*.txt
From the test dataset, read in the measurement feautres from the file X_*.txt. The measurement values should correspond to the features read in step 2 before the subsetting. The measurement values are then projected onto only the features left after the subsetting in step 2.
The activities associated with all the records are read in from the file y_*.txt for the test dataset, and the numerical activity qualitative values are replaced with the textual labels acquired from step 3.
The columns from steps 4, 5, and 6 are then combined into one table where the rows are the observations and the columns are the variables.
The steps 4:7 are repeated for the train dataset.
The outcome datasets from steps 7 and 8 are combined vertically (the rows from the latter are appended to rows from the former) and this is our final tidy dataset: tidy_data.csv
The separate dataset tidy_data-averaged.csv is produced by further modifying the dataset from step 9 by computing the arithmetic mean of each measurement variable across all observations, while grouping by the subject and activity.

mcanabalb/coursera-getdata

Getting and Cleaning Data -- Course Project