reheck/MechaCar_Statistical_Analysis

R

MechaCar_Statistical_Analysis

Linear Regression to Predict MPG

Looking at the Pr(>|t|) values of each coefficient, the results of the multiple linear regression analysis on the MechaCar data show that ground clearance and vehicle length (as well as the intercept) provide a non-random amount of variance to the mpg values. Therefore, ground clearance and vehicle length have a significant impact on the mpg. The intercept is also statistically significant meaning there may be other factors besides ground clearance and vehicle length that could better explain and predict mpg that are not included in this dataset.

The slope of the linear model is not considered to be zero because the p-value is less than 0.05 which means the null hypothesis is rejected. Since the null hypothesis for linear regression is that the slope is zero, and we are rejecting the null hypothesis, then the slope is NOT zero.

This linear model does not predict mpg of MechaCar prototypes very effectively because of the statistical significance of the intercept term, and the fact that only 2 of the 5 independent variables were statistically significant to explain the variance in the dependent variable of mpg. Although the multiple linear regression does well predicting this dataset, there is evidence of overfitting meaning it will not predict future data correctly and does not provide a good generalization of the current data.

Summary Statistics on Suspension Coils

The design specifications for the MechaCar suspension coils dictate that the variance of the suspension coils must not exceed 100 pounds per square inch. For the total manufacturing data set, the variance is less than 100 psi and therefore meets the design specifications. The same is true for Lot 1 and Lot 2, however Lot 3 when taken alone does NOT meet the design specifications. Lot 3 has a variance of 170.286 which is well above the allowed variance of 100 psi in the design spec.

Total_Summary

Lot_Summary

T-Tests on Suspension Coils

The results of the t-tests on the suspension coil data show that for all manufacturing lots combined, the test fails to reject the null hypothesis because the p-value is greater than 0.05. Therefore, the true mean of data is equal to the population mean of 1500. The same is true for Manufacturing Lot 1 and Manufacturing Lot 2 according to the t-test for those respective subset's of the data. However, for Manufacturing Lot 3, the t-test on this subset of data rejects the null hypothesis becasue the p-value is 0.04168 which is less than 0.05. Therefore, the alternative hypothesis is correct that the true mean is NOT equal to the population mean of 1500.

All Manufacturing Lots

Manufacturing Lot 1

Manufacturing Lot 2

Manufacturing Lot 3

Study Design: MechaCar vs Competition

In order to quantify how the MechaCar performs against the competition, more data would need to be collected and a statistical study be performed on that data. In these days of high gas prices and human-induced global climate change, one metric of the utmost importance in a vehicle is city and highway fuel efficiency. The statistical test to perform would be a two-sample t-test to compare fuel efficiency of MechaCar to the competitor's fuel efficiency. The null hypothesis would be that there is no significant difference between the mean fuel efficiencies, and the alternative hypothesis would be that there is a significant difference between the mean fuel efficiencies. The fuel efficiency data for this two-sample t-test is numerical and continuous and would need to come from a reasonably large sample size for both MechaCar and the competitors' vehicles. The two-sample t-test would need to be performed once per each competitor vehicle to compare MechaCar to the different competitors' vehicles.