#Data Cleaning Project for the online course "Getting and Cleaning Data"
This repository was created to host the scripts for the online course I am taking on Coursera.
- run_analysis.R: R script to process the raw data provided by the class project. See the Instruction section below for details on how to use the script.
- README.md: This is the current file, which serves as the documentation and instruction of run_analysis.R. It also contains the codebook for the output file which is a tidy version of the raw data.
The raw data were collected by motion sensors attached to human subjects while they were carrying out six different activities (e.g. Walking, laying, etc). The description of the data can be found on the website of UCI Machine Learning repository.
Create one R script called run_analysis.R that processes the raw data which
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive activity names.
- Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
mkdir ~/dataproject
cd ~/dataproject
mkdir tidyData script
wget https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
unzip getdata%2Fprojectfiles%2FUCI\ HAR\ Dataset.zip
ln -s UCI\ HAR\ Dataset/ rawData
cd ~/dataProject/script
wget https://github.com/conge/dataCleaningCode/blob/master/run_analysis.R
you can run it under Linux/Unix by typing R at the prompt.
R
Once in R enter
source('run_analysis.R')
Wait for a few minutes and you'll see a file named smartphone.cvs in the folder ~/dataProject/tidyData
The script run_analysis.R read in all the trainning and testing data from the raw data folders and combine them into one big dataset. Average of variables were calculated for each activity and each subject.