A one-sample
You will be able to:
- Explain use cases for a 1-sample z-test
- Set up null and alternative hypotheses
- Use the z-table and scipy methods to acquire the p value for a given z-score
- Calculate and interpret p-value for significance of results
The one-sample
The best way to explain how one-sample
Let's set up a problem scenario (known as a research question or analytical question) and apply a one-sample
A data scientist wants to examine if there is an effect on IQ scores when using tutors. To analyze this, she conducts IQ tests on a sample of 40 students and wants to compare her students' IQ to the general population IQ. The way an IQ score is structured, we know that a standardized IQ test has a mean of 100 and a standard deviation of 16. When she tests her group of students, however, she gets an average IQ of 103. Based on this finding, does tutoring make a difference?
The alternative hypothesis always reflects the idea or theory that needs to be tested. For this problem, you want to test if tutoring has resulted in a significant increase in student IQ. So, you would write it down as:
The sample mean is significantly bigger than the population mean
Again, significance is key here. If we denote the sample mean as
The alternative hypothesis here is that the population mean
Maybe the tutoring results in a lower IQ... Who knows!
For now, you'll just check for a significant increase, for now, to keep the process simple.
For a one-sample
There is no significant difference between the sample mean and population mean
Remember the emphasis is on a significant difference, rather than just any difference as a natural result of taking samples.
Denoting the sample mean as
Now that your hypotheses are in place, you have to decide on your significance level alpha (
As discussed previously, often,
Later, you'll see that using
If you test both sides of the distribution (
Each purple region would be calculated as
For
This formula slightly differs from the standard score formula. It includes the square root of
Now, all you need to do is use this formula given your sample mean
Let's use Python to calculate this.
import scipy.stats as stats
from math import sqrt
x_bar = 103 # sample mean
n = 40 # number of students
sigma = 16 # sd of population
mu = 100 # Population mean
z = (x_bar - mu)/(sigma/sqrt(n))
z
1.1858541225631423
Let's try to plot this
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn')
plt.fill_between(x=np.arange(-4,1.19,0.01),
y1= stats.norm.pdf(np.arange(-4,1.19,0.01)) ,
facecolor='red',
alpha=0.35,
label= 'Area below z-statistic'
)
plt.fill_between(x=np.arange(1.19,4,0.01),
y1= stats.norm.pdf(np.arange(1.19,4,0.01)) ,
facecolor='blue',
alpha=0.35,
label= 'Area above z-statistic')
plt.legend()
plt.title ('z-statistic = 1.19');
Remember that scipy.stats
to calculate it directly.
In SciPy, the cumulative probability up to the
stats.norm.cdf(z)
0.8821600432854813
The percent area under the curve from to a
Mathematically, you want to get the p-value, and this can be done by subtracting the
pval = 1 - stats.norm.cdf(z)
pval
0.11783995671451875
Our p-value (0.12) is larger than the alpha of 0.05. So what does that mean? Can you not conclude that tutoring leads to an IQ increase?
Well, you still can't really say that for sure. What we can say is that there is not enough evidence to reject the null hypothesis with the given sample, given an alpha of 0.05. There are ways to scale experiments up and collect more data or apply sampling techniques to be sure about the real impact.
And even when the sample data helps to reject the null hypothesis, you still cannot be 100% sure of the outcome. What you can say, however, is given the evidence, the results show a significant increase in the IQ as a result of tutoring, instead of saying "tutoring improves IQ".
In this lesson, you learned to run a one-sample