The gettingandcleaning repository contains the below files expected by the project:
File Name | Description |
---|---|
README.md | Readme file that explains each file in the repository and a brief description of the file |
run_analysis.R | R Function file that creates the tidy data set |
CodeBook.md | Contains step by step explanation of the run_analysis function and the desc of the variable names in the final output file final_tidy_data.txt |
final_tidy_data.txt | Tidy data set produced by running run_analysis() |
final_labeled_data.txt | Text file produced before the last step of producing tidy data |
The repository also contains the below data files that was supplied for the project:
Data File | Description |
---|---|
features_info.txt | Shows information about the variables used on the feature vector |
features.txt | List of all features |
activity_labels.txt | Links the class labels with their activity name |
X_train.txt | Training set |
y_train.txt | Training labels |
subject_train.txt | Each row identifies the subject who performed the train activity for each window sample |
X_test.txt | Test set |
y_test.txt | Test labels |
subject_test.txt | Each row identifies the subject who performed the test activity for each window sample |
The function run_analysis
accepts the working directory workingdir
parameter as an input. The parameter has the current directory as the default value.
- It sets the working directory
- Loads the data.table library
- Reads the activity_lablels.txt file into
activity_labels
data table and names the two columns asactivity_id
andactivity_name
- Read the features.txt file into
features
data table and names the two columns asfeature_id
andfeature_desc
- Read the X test and train data and combine them into a
Xcombined
data table. Name the columns in the data table from thefeature_desc
column in thefeatures
data table - Select only the columns that are mean or standard deviation measurements from the
Xcombined
data table by searching for mean or std pattern in the column names. Name the resulting set asmeanstd
- Read the y test and train data and combine them into
ycombined
data table. Name the column in the data table asactivity_id
- Read the subject test and train data and combine them into
subjectcombined
data table. Name the column in the data table assubject_id
- Combine the
meanstd
,ycombined
andsubjectcombined
into a single data table calledcd
- Merge
cd
andactivity_labels
onactivity_id
- Remove the activity_id column as we have the activity_name and name it as
cdt
- Write
cdt
to a file calledfinal_labeled_data.txt
with TAB as delimiter - Create a new tidy data set by activity_name and subject and mean of the other variables with TAB as delimiter. Call this the `final_tidy_data.txt
Running run_analysis from R console should produce two output files that are TAB delimited in the working directory. The files are final_labeled_data.txt
and final_tidy_data.txt