sophryu99/TIL

Hypothesis test: Alpha decision level, p-value

Opened this issue · 3 comments

What does alpha mean in hypothesis test?

  • Before you run any statistical test, you must first determine your alpha level, which is also called the “significance level.”
  • Alpha level is the probability of rejecting the null hypothesis when the null hypothesis is true.
  • It's the probability of making a wrong decision

Thanks to famed statistician R. A. Fisher, most folks typically use an alpha level of 0.05. However, if you’re analyzing airplane engine failures, you may want to lower the probability of making a wrong decision and use a smaller alpha. On the other hand, if you're making paper airplanes, you might be willing to increase alpha and accept the higher risk of making the wrong decision.

Like all probabilities, alpha ranges from 0 to 1.

What is the p-value of hypothesis test?

  • p-value is the probability of obtaining a result as extreme as, or more extreme than, the result actually obtained when the null hypothesis is true.
  • The p-value is basically the probability of obtaining your sample data IF the null hypothesis (e.g., the average cost of Cairn terriers = $400) were true. So if you obtain a p-value of 0.85, then you have little reason to doubt the null hypothesis. However, if your p-value is say 0.02, there’s only a very small chance you would have obtained that data if the null hypothesis was in fact true.

And since the p-value is a probability just like alpha, p-values also range from 0 to 1.

Calculating p-value

The p-value is calculated using the sampling distribution of the test statistic under the null hypothesis, the sample data, and the type of test being done (lower-tailed test, upper-tailed test, or two-sided test).

The p-value for:

  • a lower-tailed test is specified by: p-value = P(TS ts | H0 is true) = cdf(ts)
  • an upper-tailed test is specified by: p-value = P(TS ts | H0 is true) = 1 - cdf(ts)
  • assuming that the distribution of the test statistic under H0 is symmetric about 0, a two-sided test is specified by: p-value = 2 * P(TS |ts| | H0 is true) = 2 * (1 - cdf(|ts|))

P: Probability of an event
TS: Test statistic
ts: observed value of the test statistic calculated from your sample
cdf(): Cumulative distribution function of the distribution of the test statistic (TS) under the null hypothesis