The **normal distribution**, also known as the *Bell Curve*, has
been used (and abused) as a model for a wide variety of phenomena to the point
that some have the impression that any data that does not fit this model is
in some way *abnormal*. That is not the case. The name *normal
distribution* comes from the title of the paper Carl Friedrich Gauss wrote
that first described the mathematical properties of the bell curve, ``On the
Normal Distribution of Errors''. For this reason, the distribution is
sometimes referred to as the **gaussian distribution**. Perhaps that
name would be less misleading. The main importance of this model comes from the
central role it plays in the behavior of many statistics that are derived from
large samples.

The normal distribution represents a family of distribution functions,
parametrized by the mean and standard deviation, denoted by .
The density function for this distribution is

The mean is referred to as a location parameter since it determines the location of the peak of the curve. The standard deviation is referred to as a scale parameter since it determines how spread out or concentrated the curve is. The plots below illustrate these properties. In the first plot, the means differ but the standard deviations are all the same. In the second plot, both the means and the standard deviations differ.

Probability that a continuous random variable is contained within an interval
is modeled by the area under the curve corresponding to the interval. Suppose
for example we have a random variable that has a distribution and we
are interested in the probability that this r.v. takes a value between 45
and 60. The problem now is to determine this area. Unfortunately (or perhaps
fortunately from the point of view of students) the normal density function
does not have an explicit integral. This implies that we must either use a
set of tabulated values to obtain areas under the curve or use a computer
routine to determine the areas. One property satisfied by the family of normal
distributions is *closure under linear transformations*. That is, if
, and if , then
. We
can make use of this property by noting that

has a distribution. This distribution is referred to as the

This shows that the

For example, if , then

As can be seen by comparing these two plots, the areas for and
are the same. Therefore, it is only necessary to tabulate areas
for the standard normal distribution. The textbook contains such a table on
page 789. This table gives areas under the standard normal curve below
*z* for . This table requires an additional property of
normal distributions called symmetry:

**Example**. Suppose a questionnaire designed to assess employee
satisfaction with working conditions is given to the employees of a large
corporation, and that the scores on this questionnaire are approximately
normally distributed with mean 120 and standard deviation 18.

**a)** Find the proportion of employees who scored below 150.

**b)** Find the proportion of employees who scored between 140 and 160.

**c)** What proportion scored above 105?

**d)** What proportion scored between 90 and 145?

These areas are represented in the plots given below.

**e)** 15% of employees scored below what value?

**Solutions**

**a)** First transform to .

From the table on the inside back cover of the text, the area below 1.67 is 0.9525. Therefore,

In this case we must subtract the area below 1.11 from the area below 2.22. From the table these areas are, respectively, .8665 and .9868. This gives

The symmetry property of the normal distribution implies that the area above -0.83 is the same as the area below 0.83, which we get from the table.

The area we require is the difference between the area below 1.39 and the area below -1.67. By symmetry, the area below -1.67 is the same as the area above 1.67.

If you check this answer by finding the area below 101.28, you will see that the steps we just followed are the same steps we used to find areas but applied in reverse order. Also note that the value of 101.28 represents the percentile of this normal distribution. Other percentiles can be obtained similarly.

Since z-scores represent the number of standard deviations from the mean, and since they are directly associated with percentiles, they can be used to determine the relative standing of an observation from a normally distributed population. In particular, consider the following three intervals: , , and . After converting these intervals to z-scores, they become, respectively, (-1,1), (-2,2), and (-3,3). Because of the symmetry property, the probabilities for these intervals are,

This is the basis for the

Suppose that in the previous example an employee scored 82 on the employee satisfaction survey. The z-score for 82 is (82-120)/18 = -2.11. So this score is more than 2 standard deviations below the mean. Since 95% of the scores are within 2 standard deviations of the mean, this is a relatively low score. We could be more specific by determining the percentile rank for this score. From the table of normal curve areas, the area below 2.11 is 0.9826, so the area below is . That is, only 1.74% of those who took this questionnaire scored this low or lower.

2017-11-16