- Linear relationship between Input and Output
- Simple : One Independent Variable
- Multiple : More than One Independent Variables
- OLS :
- Ordinary Least Square
- Sum of all [(Acutal - Observed)^2] = Total Error
- Steps to build Regression Model
- Select all variable
- Stepwise Regression - Backward & Forward
- Model Score comparision
- Linearity : X linear to Y
- Constant Error Variance : Homoscedacity
- Independent Error Term : Auto Correlation
- Normal Error : Normal distribution of Error
- No multicollinearity : Independent X variables
- Exogenity : Omitted Variance Bias
- Plot of Error
- Homoscedasticity : Same Variance
- Hetroscedasticity : Different Variance
- Covariance : Direction of a relationship between variables
- Correlation : Strength & Direction of a relationship between variables
- Why colliearity a Problem ?
- Check Collinearity
- Multi collinearity
- Similarity between observations as a function of time lag between them
- Detects multicollinearity in Regression
- Step by steps checking Regression Assumptions
- Asses Model Performance
- Mean Absolute Error (MAE)
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE)
- Mean Absolute Percentage Error (MAPE)
- Mean Percentage Error (MPE)
- R Square
- Total Variation = Explain Variation + Unexplained Variation
- SST = SSR + SSE
- SST = Sum of Square Total
- SSR = Sum of Square Residual = Actual - Mean = explained error
- SSE = Sum of Square Error = Actual - Predicted = unexplained error
- R = Correlation Value is known as R
- R Square = SSR / SST
- R Square is variation explained by the Data
- Evalutes 2 or more exclusive statements
- Null Hypothesis is always neutral (no relationship between variables)
- Alternate Hypothesis is always neutral (there is a relationship between variables)
- Probability for the hypothesis to be True
- All statistical package give P Value of Alternate Hypothesis
- So P value for alternate hypothesis to be True
- Level of Significance is Probability with which we will reject the Null Hypothesis denoted by (alpha)
- Confidence Level is Probability with which we will accept the Null Hypothesis denoted by (1 - alpha)
- Meausre of Uncertainity in Sample Mean
- Population Mean != Sample Mean
- Akaike Information Criterion
- Bayesian Information Criterion
- Both Penalize the complex model in nature
- Overall output is explained in depth
- Python Stats Model output explained in depth
- Step by step code of Simple & Multiple Regression