/GettingAndCleaningData

This repository is created as a assignment to the Data Science specialization courses. It is from the course Getting and Cleaning Data.

Primary LanguageR

Getting and Cleaning Data

This repository is created as a assignment to the Data Science specialization courses. It is from the course Getting and Cleaning Data. the package contains a R Script to perform necessary operations on the data, a README file to describe the operations & a CODEBOOK file to define the variables in the data.

The task of the assignment is to read a data from file, extract some specific data, label the data and finally generate some computed data. This README file describes:

  • The initial data format
  • The tasks
  • The operations performed for each task
  • The final output form

Initial Data Format

The script takes input a single parameter containing path to the data directory. It assumes that the data is stored in this directory hierarchy.

UCI HAR Dataset/
├───test/
│   ├───Inertial Signals/
│   │   ├───body_acc_x_test.txt
│   │   ├───body_acc_y_test.txt
│   │   ├───body_acc_z_test.txt
│   │   ├───body_gyro_x_test.txt
│   │   ├───body_gyro_y_test.txt
│   │   ├───body_gyro_z_test.txt
│   │   ├───total_acc_x_test.txt
│   │   ├───total_acc_y_test.txt
│   │   └───total_acc_z_test.txt
│   ├───subject_test.txt
│   ├───X_test.txt
│   └───y_test.txt
├───train/
│   ├───Inertial Signals/
│   │   ├───body_acc_x_train.txt
│   │   ├───body_acc_y_train.txt
│   │   ├───body_acc_z_train.txt
│   │   ├───body_gyro_x_train.txt
│   │   ├───body_gyro_y_train.txt
│   │   ├───body_gyro_z_train.txt
│   │   ├───total_acc_x_train.txt
│   │   ├───total_acc_y_train.txt
│   │   ├───total_acc_z_train.txt
│   ├───subject_train.txt
│   ├───X_train.txt
│   └───y_train.txt
├───activity_labels.txt
├───features.txt
├───features_info.txt
├───README.txt
└───required_features.txt

Data Dimensions

File Name Description # of Rows # of Columns
X_train.txt Contains training data set for all features 7352 561
X_test.txt Contains training data set for all features 2947 561
y_train.txt Contains activity id for all training set 7352 1
y_test.txt Contains activity id for all test set 2947 1
subject_train.txt Contains subject id for all training set 7352 1
subject_test.txt Contains subject id for all test set 2947 1
activity_labels.txt contains activity id along with name text 6 2
features.txt Contains list of all features 561 2
required_features.txt Contains short list of features with only mean() & std() values 79 2

Tasks

  • Step 1: Load training data & test data and merge them
  • Step 2: Extract only those columns which contains measurement on the mean() or std() for each measurement
  • Step 3: Use activity name text instead of id
  • Step 4: Label the data set with appropriate name for each variable
  • Step 5: Create a Second data set with mean() for each variable for each activity for each subject

Final Output

Final output contains 180 rows & 81 columns. Here are some summary of the data.

# > str(final_tidy_data)
##Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':	180 obs. of  81 variables:
## $ subject                      : int  1 1 1 1 1 1 2 2 2 2 ...
## $ activity                     : Factor w/ 6 levels "WALKING","WALKING_UPSTAIRS",..: 1 2 3 4 5 6 1 2 3 4 ...
## $ tBodyAcc.mean.X              : num  0.277 0.255 0.289 0.261 0.279 ...
## $ tBodyAcc.mean.Y              : num  -0.01738 -0.02395 -0.00992 -0.00131 -0.01614 ...
## $ tBodyAcc.mean.Z              : num  -0.1111 -0.0973 -0.1076 -0.1045 -0.1106 ...
## $ tBodyAcc.std.X               : num  -0.284 -0.355 0.03 -0.977 -0.996 ...
## $ tBodyAcc.std.Y               : num  0.11446 -0.00232 -0.03194 -0.92262 -0.97319 ...
## $ tBodyAcc.std.Z               : num  -0.26 -0.0195 -0.2304 -0.9396 -0.9798 ...
## $ tGravityAcc.mean.X           : num  0.935 0.893 0.932 0.832 0.943 ...
## $ tGravityAcc.mean.Y           : num  -0.282 -0.362 -0.267 0.204 -0.273 ...
## $ tGravityAcc.mean.Z           : num  -0.0681 -0.0754 -0.0621 0.332 0.0135 ...
## $ tGravityAcc.std.X            : num  -0.977 -0.956 -0.951 -0.968 -0.994 ...
## $ tGravityAcc.std.Y            : num  -0.971 -0.953 -0.937 -0.936 -0.981 ...
## $ tGravityAcc.std.Z            : num  -0.948 -0.912 -0.896 -0.949 -0.976 ...
## $ tBodyAccJerk.mean.X          : num  0.074 0.1014 0.0542 0.0775 0.0754 ...
## $ tBodyAccJerk.mean.Y          : num  0.028272 0.019486 0.02965 -0.000619 0.007976 ...
## $ tBodyAccJerk.mean.Z          : num  -0.00417 -0.04556 -0.01097 -0.00337 -0.00369 ...
## $ tBodyAccJerk.std.X           : num  -0.1136 -0.4468 -0.0123 -0.9864 -0.9946 ...
## $ tBodyAccJerk.std.Y           : num  0.067 -0.378 -0.102 -0.981 -0.986 ...
## $ tBodyAccJerk.std.Z           : num  -0.503 -0.707 -0.346 -0.988 -0.992 ...
## $ tBodyGyro.mean.X             : num  -0.0418 0.0505 -0.0351 -0.0454 -0.024 ...
## $ tBodyGyro.mean.Y             : num  -0.0695 -0.1662 -0.0909 -0.0919 -0.0594 ...
## $ tBodyGyro.mean.Z             : num  0.0849 0.0584 0.0901 0.0629 0.0748 ...
## $ tBodyGyro.std.X              : num  -0.474 -0.545 -0.458 -0.977 -0.987 ...
## $ tBodyGyro.std.Y              : num  -0.05461 0.00411 -0.12635 -0.96647 -0.98773 ...
## $ tBodyGyro.std.Z              : num  -0.344 -0.507 -0.125 -0.941 -0.981 ...
## $ tBodyGyroJerk.mean.X         : num  -0.09 -0.1222 -0.074 -0.0937 -0.0996 ...
## $ tBodyGyroJerk.mean.Y         : num  -0.0398 -0.0421 -0.044 -0.0402 -0.0441 ...
## $ tBodyGyroJerk.mean.Z         : num  -0.0461 -0.0407 -0.027 -0.0467 -0.049 ...
## $ tBodyGyroJerk.std.X          : num  -0.207 -0.615 -0.487 -0.992 -0.993 ...
## $ tBodyGyroJerk.std.Y          : num  -0.304 -0.602 -0.239 -0.99 -0.995 ...
## $ tBodyGyroJerk.std.Z          : num  -0.404 -0.606 -0.269 -0.988 -0.992 ...
## $ tBodyAccMag.mean             : num  -0.137 -0.1299 0.0272 -0.9485 -0.9843 ...
## $ tBodyAccMag.std              : num  -0.2197 -0.325 0.0199 -0.9271 -0.9819 ...
## $ tGravityAccMag.mean          : num  -0.137 -0.1299 0.0272 -0.9485 -0.9843 ...
## $ tGravityAccMag.std           : num  -0.2197 -0.325 0.0199 -0.9271 -0.9819 ...
## $ tBodyAccJerkMag.mean         : num  -0.1414 -0.4665 -0.0894 -0.9874 -0.9924 ...
## $ tBodyAccJerkMag.std          : num  -0.0745 -0.479 -0.0258 -0.9841 -0.9931 ...
## $ tBodyGyroMag.mean            : num  -0.161 -0.1267 -0.0757 -0.9309 -0.9765 ...
## $ tBodyGyroMag.std             : num  -0.187 -0.149 -0.226 -0.935 -0.979 ...
## $ tBodyGyroJerkMag.mean        : num  -0.299 -0.595 -0.295 -0.992 -0.995 ...
## $ tBodyGyroJerkMag.std         : num  -0.325 -0.649 -0.307 -0.988 -0.995 ...
## $ fBodyAcc.mean.X              : num  -0.2028 -0.4043 0.0382 -0.9796 -0.9952 ...
## $ fBodyAcc.mean.Y              : num  0.08971 -0.19098 0.00155 -0.94408 -0.97707 ...
## $ fBodyAcc.mean.Z              : num  -0.332 -0.433 -0.226 -0.959 -0.985 ...
## $ fBodyAcc.std.X               : num  -0.3191 -0.3374 0.0243 -0.9764 -0.996 ...
## $ fBodyAcc.std.Y               : num  0.056 0.0218 -0.113 -0.9173 -0.9723 ...
## $ fBodyAcc.std.Z               : num  -0.28 0.086 -0.298 -0.934 -0.978 ...
## $ fBodyAcc.meanFreq.X          : num  -0.2075 -0.4187 -0.3074 -0.0495 0.0865 ...
## $ fBodyAcc.meanFreq.Y          : num  0.1131 -0.1607 0.0632 0.0759 0.1175 ...
## $ fBodyAcc.meanFreq.Z          : num  0.0497 -0.5201 0.2943 0.2388 0.2449 ...
## $ fBodyAccJerk.mean.X          : num  -0.1705 -0.4799 -0.0277 -0.9866 -0.9946 ...
## $ fBodyAccJerk.mean.Y          : num  -0.0352 -0.4134 -0.1287 -0.9816 -0.9854 ...
## $ fBodyAccJerk.mean.Z          : num  -0.469 -0.685 -0.288 -0.986 -0.991 ...
## $ fBodyAccJerk.std.X           : num  -0.1336 -0.4619 -0.0863 -0.9875 -0.9951 ...
## $ fBodyAccJerk.std.Y           : num  0.107 -0.382 -0.135 -0.983 -0.987 ...
## $ fBodyAccJerk.std.Z           : num  -0.535 -0.726 -0.402 -0.988 -0.992 ...
## $ fBodyAccJerk.meanFreq.X      : num  -0.209 -0.377 -0.253 0.257 0.314 ...
## $ fBodyAccJerk.meanFreq.Y      : num  -0.3862 -0.5095 -0.3376 0.0475 0.0392 ...
## $ fBodyAccJerk.meanFreq.Z      : num  -0.18553 -0.5511 0.00937 0.09239 0.13858 ...
## $ fBodyGyro.mean.X             : num  -0.339 -0.493 -0.352 -0.976 -0.986 ...
## $ fBodyGyro.mean.Y             : num  -0.1031 -0.3195 -0.0557 -0.9758 -0.989 ...
## $ fBodyGyro.mean.Z             : num  -0.2559 -0.4536 -0.0319 -0.9513 -0.9808 ...
## $ fBodyGyro.std.X              : num  -0.517 -0.566 -0.495 -0.978 -0.987 ...
## $ fBodyGyro.std.Y              : num  -0.0335 0.1515 -0.1814 -0.9623 -0.9871 ...
## $ fBodyGyro.std.Z              : num  -0.437 -0.572 -0.238 -0.944 -0.982 ...
## $ fBodyGyro.meanFreq.X         : num  0.0148 -0.1875 -0.1005 0.1892 -0.1203 ...
## $ fBodyGyro.meanFreq.Y         : num  -0.0658 -0.4736 0.0826 0.0631 -0.0447 ...
## $ fBodyGyro.meanFreq.Z         : num  0.000773 -0.133374 -0.075676 -0.029784 0.100608 ...
## $ fBodyAccMag.mean             : num  -0.1286 -0.3524 0.0966 -0.9478 -0.9854 ...
## $ fBodyAccMag.std              : num  -0.398 -0.416 -0.187 -0.928 -0.982 ...
## $ fBodyAccMag.meanFreq         : num  0.1906 -0.0977 0.1192 0.2367 0.2846 ...
## $ fBodyBodyAccJerkMag.mean     : num  -0.0571 -0.4427 0.0262 -0.9853 -0.9925 ...
## $ fBodyBodyAccJerkMag.std      : num  -0.103 -0.533 -0.104 -0.982 -0.993 ...
## $ fBodyBodyAccJerkMag.meanFreq : num  0.0938 0.0854 0.0765 0.3519 0.4222 ...
## $ fBodyBodyGyroMag.mean        : num  -0.199 -0.326 -0.186 -0.958 -0.985 ...
## $ fBodyBodyGyroMag.std         : num  -0.321 -0.183 -0.398 -0.932 -0.978 ...
## $ fBodyBodyGyroMag.meanFreq    : num  0.268844 -0.219303 0.349614 -0.000262 -0.028606 ...
## $ fBodyBodyGyroJerkMag.mean    : num  -0.319 -0.635 -0.282 -0.99 -0.995 ...
## $ fBodyBodyGyroJerkMag.std     : num  -0.382 -0.694 -0.392 -0.987 -0.995 ...
## $ fBodyBodyGyroJerkMag.meanFreq: num  0.191 0.114 0.19 0.185 0.334 ...
## - attr(*, "vars")=List of 1
##  ..$ : symbol subject
## - attr(*, "drop")= logi TRUE