The results of the previous section are derived from the Central Limit Theorem. We can use similar methods to estimate the mean of a population. We will first consider this estimation problem when the population has a normal distribution, and then we will examine the extension of these methods to populations that are not necessarily normally distributed.
Recall that if the population has a normal distribution
with mean
and standard deviation
, then the distribution of
is
. This implies that we can use
as an estimate of
. The error of estimation is then
, and we can make the following probability statement about
this error,
The problem here is that this confidence interval depends on
, the
population standard deviation. In most situations,
is unknown as well
as
. Sometimes we have prior information available that gives an upper
bound for
,
, which can be incorporated into the
confidence interval,
This problem was solved around 100 years ago by a statistician named William
Gossett, who solved it while working for Guinness brewery. Because of
non-disclosure agreements in his employment contract, Gossett had to publish
his work under the pseudonym Student. For this reason, the
distribution of
when
is a random sample from a normal
distribution is called Student's t distribution. This distribution
is similar to the standard normal distribution and represents an adjustment to
the sampling distribution of
caused by replacing the constant
with a random variable
. As the sample size increases,
becomes a better estimate of
, and so less adjustment is required.
Therefore, the t-distribution depends on the sample size. This dependence is
expressed by a function of the sample size called degrees of freedom,
which for this problem is
. That is, the sampling distribution of
is a t-distribution with n-1 degrees of freedom. A plot that
compares several t-distributions with the standard normal distribution is given
below. Note that the t-distribution is symmetric and has relatively more area
in the extremes and less area in the central region compared to the standard
normal distribution. Also, as the degrees of freedom increases, the
t-distribution converges to the standard normal distribution.
We can now make use of Gossett's result to obtain a confidence interval for
,
The probability statement associated with this confidence interval,
Sample size determination.
If our estimate must satisfy requirements both for the level of confidence and
for the precision of the estimate, then it is necessary to have some prior
information that gives a bound on
or an estimate of
. Let
denote this bound or estimate, and let
denote the required
precision. Then the confidence interval must have the form,
, which implies that
Example. A random sample of 22 existing home sales during the last
month showed that the mean difference between list price and sales price was
$4580 with a standard deviation of $1150. Assume that the differences between
list and sales prices have approximately a normal distribution and construct a
95% confidence interval for the mean difference for all existing home sales.
What would you say if the mean difference between list and sales prices for the
same month last year had been $5500? Suppose you wish to estimate this mean
to within $250 with 99% confidence. What sample size would be required if
you use the standard deviation of this sample as an estimate of
?
Solution. The confidence interval has the form
Since the results discussed above are based on the Central Limit Theorem,
we can apply them in the same way to the problem of estimating the mean of
a population that does not necessarily have a normal distribution. This would
lead to the same confidence interval for
,