next up previous
Next: Other Estimation Problems Up: Class Notes Previous: Simulation of confidence intervals

Estimation of a Population Variance

In the previous section we used the sample standard deviation as an estimate of the population standard deviation, so it is natural to consider how good is this estimate. Note: this section is only applicable if the population is approximately normally distributed.

Suppose we have a population of measurements with mean $\mu$ and variance $\sigma^2$, and we have randomly selected a sample of size n from this population. To determine how to construct confidence intervals for $\sigma^2$ we can use a similar thought experiment to what we considered for the estimation of a population proportion. Suppose we could obtain every possible sample of size n from this population and computed the sample variance for each of these samples. The experiment in which we randomly select a single sample of size n and compute the sample variance of this sample would be equivalent to randomly selecting a single sample variance from the population of all possible sample variances. Therefore, probability statements about the sample variance could be derived from the distribution of all possible sample variances in a way that is similar to how we constructed confidence intervals for a population proportion and population mean. Statistical theory tells us that if the population distribution is approximately a normal curve, then the distribution of

\begin{displaymath}
S_n = \frac{(n-1)s^2}{\sigma^2}
\end{displaymath}

has a distribution that is referred to as a Chi-square distribution with n-1 degrees of freedom.

Image stat5311est4

Since the chi-square distribution is not symmetric, we need to find upper and lower values from this distribution such that the area between them is the required level of confidence for the confidence interval. Let Clower denote the value from the chi-square distribution with n-1 degrees of freedom such that the area below it is $\alpha/2$ and let Cupper denote the value from the chi-square distribution such that the area above it is $\alpha/2$. Then we can make the following probability statement,

\begin{displaymath}
P(Clower\le \frac{(n-1)s^2}{\sigma^2} \le Cupper) = 1 - \alpha.
\end{displaymath}

The confidence interval for $\sigma^2$ is derived by manipulating the inequality in this probability so that $\sigma^2$ is between the inequalities. We can do this as follows:

\begin{displaymath}
Clower\le \frac{(n-1)s^2}{\sigma^2} \Longleftrightarrow \sigma^2 \le \frac{(n-1)s^2}{Clower},
\end{displaymath}

and

\begin{displaymath}
\frac{(n-1)s^2}{\sigma^2} \le Cupper \Longleftrightarrow \sigma^2 \ge \frac{(n-1)s^2}{Cupper}.
\end{displaymath}

Combine these inequalities to obtain

\begin{displaymath}
P(\frac{(n-1)s^2}{Cupper} \le \sigma^2 \le \frac{(n-1)s^2}{Clower}) = 1 - \alpha.
\end{displaymath}

Therefore a $1-\alpha$ confidence interval for $\sigma^2$ is

\begin{displaymath}
\left[ \frac{(n-1)s^2}{Cupper}, \frac{(n-1)s^2}{Clower}\right].
\end{displaymath}

Suppose for example that we wish to construct a 95% confidence interval for the population variance of the difference between list and sales price based on the sample of size 22 in the example above. In this case we would use a chi-square distribution with 21 degrees of freedom. Quantiles from the Chi-square distribution are obtained in R using the function qchisq().

alpha = .05
n = 22
s = 1150
Cl = qchisq(alpha/2,n-1)
Cu = qchisq(1-alpha/2,n-1)
conf.int = (n-1)*s^2/c(Cu,Cl)
conf.int
[1]  782789.7 2700843.7
We can convert this to a confidence interval for $\sigma$ by taking square roots of the interval. This gives
sqrt(conf.int)
[1]  884.7541 1643.4244


next up previous
Next: Other Estimation Problems Up: Class Notes Previous: Simulation of confidence intervals
Larry Ammann
2014-10-14