Getting and Cleaning Data Course Project
Prerequisites
- This program uses data.table for calculating tidy dataset
- If the data.table is not installed then download the package by executing install.packages("data.table")
Structure of the program
- The file run_analysis.R program mainly contains following three functions ** Function 'create_complete_model' - generates model that combines all the datasets into a single dataset ** Function 'model_with_mean_and_stddev' - generates model whose column names contain either mean or stddev ** Function 'create_tidy_dataset' - generates model grouping subject_id, activity and taking the mean of all the columns whose column name does not contain mean or stddev ** All the above three functions take logical argument called 'output_model'. If the above methods are called with TRUE then the models are generated as files in the directory named 'output'
How to run the program
- Checkout the code to a directory
- Open R studio
- Set working directory of the R Studio as the directory where the code is checked out.
- Source the run_analysis.R file by executing the command source("./run_analysis.R")
- For verification the output datasets were checked into the repository for verification.
- Before running the program, if the files are already present in the output directory they get overridden.
How to verify the course project
- To verify the first question:- to merge all datasets
- Execute the following command in R studio: model1 = create_complete_model()
- If we execute the above function with optional argument output_model = TRUE then the output is generated in output directory with file name 'all_observations.csv' str(model1, list.len=999), dim(model1) will give you number of rows, columns, fields of the merged dataset.
- To verify the second question:- To extract mean and standard deviation fields
- Execute the following command in R studio: model2 = model_with_mean_and_stddev()
- If we execute the above command with output_model = TRUE argument, then the output is generated in the output directory with file name 'observations_with_mean_and_stddev.csv' str(model2), dim(model2) will give number of rows, columns, fields of the dataset that contains only mean and standard deviation fields.
- To verify the third question:- use descriptive activity names
- Both the model1 and model2 variables created in the above commands contain column name 'activity'. Also the output is checked into the github repository in output directory.
- Output directory contains all_observations.csv with 'activity' column indicating description.
- To verify the fifth question:- independent tidy dataset
- Execute the following command in R studio: model3 = create_tidy_dataset()
- If the above command is executed with argument output_model = TRUE then the file "tidy_dataset.txt" shall be generated in the output directory.