Course Project
"Getting and Cleaning Data" course on Coursera
Files in this repository
CodeBook.md
is the file describing the data and transformations applied to itREADME.md
is the file you're reading right nowrun_analysis.R
is the script that performs the data transformation and analysis
How to run the script
You should be able to simply run it without problems:
~ $ git clone https://github.com/pwseo/GetdataCourseProject.git
~ $ cd GetdataCourseProject
~/GetdataCourseProject $ Rscript run_analysis.R
You can also run the script from a currently running R session:
> source('run_analysis.R')
If you're running this script on Windows, please make sure you're aware of your current directory -- that's where the data will be downloaded to and also where the results will be saved.
You can check your current directory with the getwd()
function and set it with setwd()
.
If you're using RStudio you can also set it by clicking Session > Set working directory > Choose directory... in the menus.
When the script is done running, the results will be saved to the working directory in a file named final.txt
.
What the script does
The script was written to function without interaction from the user. Breaking it down to steps, the script does the following:
- Download the datasets' zip file from the internet.
- Extract the downloaded zip file (creating a
UCI HAR Dataset
directory). - Assemble the
train
andtest
datasets from their individual components. - Merge the
train
andtest
datasets into a single dataset. - Keep only data related to the means and standard deviations of the original measurements.
- Label the resulting dataset's variables (columns) and activities appropriately.
- Summarize the dataset by averaging every variable (column) grouped by subject and activity.
- Output the dataset to the
final.txt
file.
Packages required
The script was written using only base R functions, so no third-party packages are required.