Peer-graded Assignment: Getting and Cleaning Data Course Project
This repository contains the necessary scripts and information to create multiple tidy datasets based on data provided by research at [http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones] and following cleaning instructions/requirements specified in the Peer-Reviewed Assignment of Week 4 of the "Getting and Cleaning Data" course on Coursera.
(done automatically by executing the run_analysis.R
script)
-
Download and extract the data from [https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip] into the root of the repository directory (will create a
UCI HAR Dataset
directory). This step will be skipped by the script if the directory already exists. -
Apply a couple of denormalisations and inlines as well as some filtering (while combining the
train
andtest
datasets into one single dataset) to make the original data more accessible (seeCodeBook.md
or evenrun_analysis.R
for more details on the operations performed.) -
Save the new dataset to
dist/dataset.txt
(seeCodeBook.md
for description of the file format and on how to load the data again.) -
Create a new dataset averaging the different features by test subject and activity (by applying the R's
mean
function, seerun_analysis.R
for the exact R code.) -
Save the new dataset to
dist/averages.txt
(using the same file format)
run_analysis.R
will try to conserve bandwidth / processing time, so it will:
- NOT download the base dataset, if the
UCI HAR Dataset
directory already exists. - NOT recompute the tidy dataset (before computing the averages dataset), if
dist/dataset.txt
already exists
So delete the appropriate files/folders if you need them re-downloaded/re-computed.