/Linear_Regression_Project

Statistical analysis of US salary data

Primary LanguageR

Linear Regression Project

STAT GU4205/5205

Data Description

The data comprise of roughly 25,000 records for males between the age of 18 and 70 who are full time workers. A variety of variables are given for each subject: years of education and job experience, college graduate (yes, no), working in or near a city (yes, no), US region (midwest, northeast, south, west), commuting distance, number of employees in a company, and race (African American, Caucasian, Other). The response variable is weekly wages (in dollars). The data are taken from many decades ago so the wages are low compared to current times.

Research Question

A government offcial is interested in whether the average male wages are statistically different for the three race classes. Specically, the government offcial wants to answer the following research questions:

  1. Do African American males have statistically different wages compared to Caucasian males?
  2. Do African American males have statistically different wages compared to all other males?

The goal of this case study is two fold:

  1. Come up with a linear regression model that incorporates all relevant variables, interactions and functional forms of the covariates.
  2. Using the final model, test the two research questions above.