Correlation doesn't equal causation by CrashCourse Statistics
EmbraceLife opened this issue · 0 comments
Correlation doesn't equal causation
Key words
correlation, causation, bivariate data, regression, regression coefficient, correlation coefficient, squared correlation, spurious correlation
Video links
Key questions
How to use one variable to predict another?
How to describe linear relationship between two variables?
Interesting points
Bivariate Data
two variables values as a pair to for a single data point
scatterplot to describe bivariate data
relationship by clusters
- shorter waiting get shorter eruption
- longer waiting time get longer eruption
Regression
Linear relationship on heights of fathers and sons
- described by lines y = mx + b
prediction of one variable value based on another variable, is enabled by such regression line with m and b values pre-determined (figure below)
Regression Coefficient
m out of y = mx + b is regression coefficient
it tells us how much y change based on the changes of x
however, it does not tell us how closely related between x and y, especially when y change its unit and causes m to become very small (see figure below)
so we want some stable metric to measure the relationship btw the two variables
correlation
measure the way two variables move together
measure the direction and closeness of their moment
it focuses on linear relationship
we have positive and negative correlation
and we can get a blob or cloud
Correlation Coefficient
to avoid units change effect on correlation calculation
- use standard deviation to scale correlation to [-1,1]
- correlation coefficient r
when r = + or - 1, x can predict exact y
steepness of regression line, |m|
- has nothing to do with regression coefficient
- both strong correlation and no relation is possible for steep regression line
- steepness can be arbitrarily affected by changing units of one variable alone
Squared Correlation
$r^2$
- between 0 and 1
- how variance of one variable is predicted by the other
$r^2 = 0.7$
- cigaret use can explain 70% of lung's healthiness
$r^2 =1$
- extract prediction without randomness
- like translation from C to F (temperature units)
Correlation is not Causation
what makes two variables correlated?
spurious correlation
- no causal link
- correlated by random chance
Data relationship beyond correlation
plots below have same correlation in
$r, r^2$ , but their relationships certainly look very different
Closure
correlation
- explain linear relationship between two variables
- go beyond y=mx + b and gives us information on how well the line explains the data
- help prediction
- review the past