EmbraceLife/shendusuipian

Correlation doesn't equal causation by CrashCourse Statistics

EmbraceLife opened this issue · 0 comments

Correlation doesn't equal causation

Key words

correlation, causation, bivariate data, regression, regression coefficient, correlation coefficient, squared correlation, spurious correlation

Video links

Bilibili

Youtube

Key questions

How to use one variable to predict another?

How to describe linear relationship between two variables?

Interesting points

Bivariate Data

two variables values as a pair to for a single data point

scatterplot to describe bivariate data

image

relationship by clusters

  • shorter waiting get shorter eruption
  • longer waiting time get longer eruption

Regression

Linear relationship on heights of fathers and sons

  • described by lines y = mx + b

image

prediction of one variable value based on another variable, is enabled by such regression line with m and b values pre-determined (figure below)

image

Regression Coefficient

m out of y = mx + b is regression coefficient

it tells us how much y change based on the changes of x

however, it does not tell us how closely related between x and y, especially when y change its unit and causes m to become very small (see figure below)

image

so we want some stable metric to measure the relationship btw the two variables

correlation

measure the way two variables move together

measure the direction and closeness of their moment

it focuses on linear relationship

we have positive and negative correlation

image

image

and we can get a blob or cloud

image

Correlation Coefficient

to avoid units change effect on correlation calculation

  • use standard deviation to scale correlation to [-1,1]
  • correlation coefficient r

when r = + or - 1, x can predict exact y

image

steepness of regression line, |m|

  • has nothing to do with regression coefficient
  • both strong correlation and no relation is possible for steep regression line
  • steepness can be arbitrarily affected by changing units of one variable alone

Squared Correlation

$r^2$

  • between 0 and 1
  • how variance of one variable is predicted by the other

$r^2 = 0.7$

  • cigaret use can explain 70% of lung's healthiness

$r^2 =1$

  • extract prediction without randomness
  • like translation from C to F (temperature units)

Correlation is not Causation

what makes two variables correlated?

image

spurious correlation

  • no causal link
  • correlated by random chance

Data relationship beyond correlation

plots below have same correlation in $r, r^2$ , but their relationships certainly look very different

image

Closure

correlation

  • explain linear relationship between two variables
  • go beyond y=mx + b and gives us information on how well the line explains the data
  • help prediction
  • review the past