#Getting and Cleaning Data Project
Jing Li
This Repo contains materials for the submission of the course project for the Johns Hopkins Getting and Cleaning Data course.
##Project Description
The purpose of this project is to demonstrate the ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis. Here are the data for the project: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip A full description is available at the site where the data was obtained: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
##Project Requirements
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
##Steps Performaned to Clean up and Tidy the Data
Make sure this run_analysis.R file is in the working directory under which the data subdirectory is created or already exists
####Merges the training and the test sets to create one data set
. Download the Dataset.zip file and store it in directory data
. Unzip the downloaded file
. Read the train Data from train data files
. Combine the feature, subject and activity train data
. Read the test Data from test data files
. Combine the feature, subject and activity test data
. Merge the train and test data
. Read features from features file
. Add names to the data set
####Extracts only the measurements on the mean and standard deviation for each measurement
. Exam the features.txt for the patterns of mean and standard deviation
. The patterns include mean(), std(), meanFreq()
. Extracts the selected data
####Uses descriptive activity names to name the activities in the data set
. Read activity labels from activity labels file
. Factorize the activity with labels
. Factorize subject
####Appropriately labels the data set with descriptive variable names
. Change mean to Mean and std to Std and remove - and ()
. Replace begining t with time
. Replace begining f with frequency
. Replace Acc with Accelerometer
. Replace Gyro with Gyroscope
. Replace Mag with Magnitude
. Replace BodyBody with Body
. Review and verify the created data set
####From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject
. Load librayr reshape2
. Melt selectedData by activity and subject
. Generate the tidy set
. Create the file tidydata.txt containing the tidy data set
##Project Deliverables
- README.md This file
- CodeBook.md A codebook which describes the variables, the data, and any transformations performed to clean up the data
- run_analysis.R A R script that performs all the clean up and transformation to generate the final product tidydata.txt file. Please also refer to the comments in this file for detailed step by step procedure to clean up and tidy the original data
- tidydata.txt The file containing the final tidy data set