In this lesson, we will introduce a special case of normal distributions: "The Standard Normal Distribution".
You will be able to:
- Compare and contrast the normal and standard normal distribution
- Calculate and interpret the z-score (standard score) for an observation from normally distributed data
- Convert between a normal and a standard normal distribution
Previously, you learned about the normal (or Gaussian) distribution, which is characterized by a bell shape curve. We also identified the mean and standard deviation as the defining parameters of this distribution. As mentioned before, normal distributions do not necessarily have the same means and standard deviations.
The standard normal distribution, however, is a special case of the normal distribution. The Standard Normal Distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
Plotting a continuous cumulative distribution function for the standard normal distribution, the CDF would look like this:
Thinking back to the standard deviation rule for normal distributions:
-
$68%$ of the area lies in the interval of 1 standard deviation from the mean, or mathematically speaking,$68%$ is in the interval$[\mu-\sigma, \mu+\sigma]$ -
$95%$ of the area lies in the interval of 2 standard deviations from the mean, or mathematically speaking,$95%$ is in the interval$[(\mu-2\sigma), (\mu+2\sigma)]$ -
$99%$ of the area lies in the interval of 3 standard deviations from the mean, or mathematically speaking,$99%$ is in the interval$[(\mu-3\sigma), (\mu+3\sigma)]$
With a
-
$68%$ of the area lies between -1 and 1. -
$95%$ of the area lies between -2 and 2. -
$99%$ of the area lies between -3 and 3.
This simplicity makes a standard normal distribution very desirable to work with. The exciting news is that you can very easily transform any normal distribution to a standard normal distribution!
The standard score (more commonly referred to as a
- Calculate the probability of a certain score occurring within a given normal distribution and
- Compare two scores that are from different normal distributions.
Any normal distribution can be converted to a standard normal distribution and vice versa using this equation:
Here,
The standard normal distribution is sometimes called the
Imagine some test results follow a normal distribution with a mean score of 50 and a standard deviation of 10.
One of the students scored a 70 on the test. Using this information and
By transforming the test result of 70 to a
In summary, calculating the
Visually, the idea is that the area under the curve, left and right from the vertical red line, are identical in the left plot and the right plot!
Thinking along these lines, you can also convert a
For the above example, this would work out as:
Data standardization is a common data preprocessing skill, which is used to compare a number of observations belonging to different normal distributions which may have distinct means and standard deviations.
Standardization applies a
Let's look at a quick example. First, we'll randomly generate two normal distributions with different means and standard deviations. Let's generate 1000 observations for each. Next, we'll use seaborn
to plot the results.
import numpy as np
import seaborn as sns
mean1, sd1 = 5, 3 # dist 1
mean2, sd2 = 10, 2 # dist 2
d1 = np.random.normal(mean1, sd1, 1000)
d2 = np.random.normal(mean2, sd2, 1000)
sns.distplot(d1);
sns.distplot(d2);
You can see that these distributions differ from each other and are not directly comparable.
For a number of machine learning algorithms and data visualization techniques, it is important that the effect of the scale of the data is removed before you start thinking about building your model. Standardization allows for this by converting the distributions into a
# Stardardizing and visualizing distributions
sns.distplot([(x - d1.mean())/d1.std() for x in d1]);
sns.distplot([(x - d2.mean())/d2.std() for x in d2]);
You see that both distributions are directly comparable on a common standard scale. As mentioned earlier, this trick will come in handy with analytics experiments while training machine learning algorithms.
Convert standard distributions back to the original normal distributions using the formula given above. Visualize them to see your original distributions.
In this lesson, you learned about a special case of the normal distribution called the standard normal distribution.
You also learned how to convert any normal distribution to a standard normal distribution using the