This assignment uses data from the UC Irvine Machine Learning Repository, a popular repository for machine learning datasets. In particular, it uses the "Individual household electric power consumption Data Set" which is available on the course web site:
-
Dataset: Electric power consumption [20Mb]
-
Description: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.
The following descriptions of the 9 variables in the dataset are taken from the UCI web site:
- Date: Date in format dd/mm/yyyy
- Time: time in format hh:mm:ss
- Global_active_power: household global minute-averaged active power (in kilowatt)
- Global_reactive_power: household global minute-averaged reactive power (in kilowatt)
- Voltage: minute-averaged voltage (in volt)
- Global_intensity: household global minute-averaged current intensity (in ampere)
- Sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
- Sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
- Sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.
When loading the dataset into R, I did the following:
-
Read in the data using the read.delim function, setting
sep = ";"
andna.strings = "?"
-
Combined date and time columns into a single variable DateTime.
-
Converted the DateTime column to date/time format using
strptime()
. -
Filtered out data for dates Feb 1 and Feb 2, 2007 into a second data frame for plotting.
The four plots that I constructed are shown below.