The normal distribution, also known as the Bell Curve, has been used (and abused) as a model for a wide variety of phenomena to the point that some have the impression that any data that does not fit this model is in some way abnormal. That is not the case. The name normal distribution comes from the title of the paper Carl Friedrich Gauss wrote that first described the mathematical properties of the bell curve, ``On the Normal Distribution of Errors''. For this reason, the distribution is sometimes referred to as the gaussian distribution. Perhaps that name would be less misleading. The main importance of this model comes from the central role it plays in the behavior of many statistics that are derived from large samples.
The normal distribution represents a family of distribution functions,
parametrized by the mean and standard deviation, denoted by .
The density function for this distribution is
Probability that a continuous random variable is contained within an interval
is modeled by the area under the curve corresponding to the interval. Suppose
for example we have a random variable that has a distribution and we
are interested in the probability that this r.v. takes a value between 45
and 60. The problem now is to determine this area. Unfortunately (or perhaps
fortunately from the point of view of students) the normal density function
does not have an explicit integral. This implies that we must either use a
set of tabulated values to obtain areas under the curve or use a computer
routine to determine the areas. One property satisfied by the family of normal
distributions is closure under linear transformations. That is, if
, and if , then
can make use of this property by noting that
For example, if , then
As can be seen by comparing these two plots, the areas for and
are the same. Therefore, it is only necessary to tabulate areas
for the standard normal distribution. The textbook contains such a table on
page 789. This table gives areas under the standard normal curve below
z for . This table requires an additional property of
normal distributions called symmetry:
Example. Suppose a questionnaire designed to assess employee
satisfaction with working conditions is given to the employees of a large
corporation, and that the scores on this questionnaire are approximately
normally distributed with mean 120 and standard deviation 18.
a) Find the proportion of employees who scored below 150.
b) Find the proportion of employees who scored between 140 and 160.
c) What proportion scored above 105?
d) What proportion scored between 90 and 145?
These areas are represented in the plots given below.
e) 15% of employees scored below what value?
a) First transform to .
Since z-scores represent the number of standard deviations from the mean, and since they are directly associated with percentiles, they can be used to determine the relative standing of an observation from a normally distributed population. In particular, consider the following three intervals: , , and . After converting these intervals to z-scores, they become, respectively, (-1,1), (-2,2), and (-3,3). Because of the symmetry property, the probabilities for these intervals are,
Suppose that in the previous example an employee scored 82 on the employee satisfaction survey. The z-score for 82 is (82-120)/18 = -2.11. So this score is more than 2 standard deviations below the mean. Since 95% of the scores are within 2 standard deviations of the mean, this is a relatively low score. We could be more specific by determining the percentile rank for this score. From the table of normal curve areas, the area below 2.11 is 0.9826, so the area below is . That is, only 1.74% of those who took this questionnaire scored this low or lower.