In the previous section we used the sample standard deviation as an estimate of the population
standard deviation, so it is natural to consider how good is this estimate. **Note**:
*this section is only applicable if the population is approximately normally distributed
or the sample size is large*.

Suppose we have a population of measurements with mean and s.d. and we have
randomly selected a sample of size *n* from this population. To determine how to construct
confidence intervals for we can use a similar thought experiment to what we considered
for the estimation of a population proportion. Suppose we could obtain every possible sample of size
*n* from this population and computed the sample s.d. for each of these samples. The
experiment in which we randomly select a single sample of size *n* and compute the sample
s.d. of this sample would be equivalent to randomly selecting a single sample s.d. from the
population of sample s.d.'s from all possible samples of size *n*. Therefore, probability
statements about the sample s.d. could be derived from the distribution of all possible sample
s.d.'s in a way that is similar to how we constructed confidence intervals for a population
proportion and population mean. In this case, however, statistical theory doesn't answer that
directly for the s.d. Instead, it tells us that if the population distribution is approximately a
normal curve or if the sample size is large, then the distribution of

has a distribution that is referred to as a

Since the chi-square distribution is not symmetric, we need to find upper and lower values from this
distribution such that the area between them is the required level of confidence for the confidence
interval. Let *Clower* denote the value from the chi-square distribution with *n-1*
degrees of freedom such that the area below it is and let *Cupper* denote the
value from the chi-square distribution such that the area above it is . Then we can make
the following probability statement,

The confidence interval for is derived by manipulating the inequality in this probability so that is between the inequalities. We can do this as follows:

and

Combine these inequalities to obtain

Therefore a confidence interval for is

Suppose for example that we wish to construct a 95% confidence interval for the population variance
of the difference between list and sales price based on the sample of size 22 in the example above.
In this case we would use a chi-square distribution with 21 degrees of freedom. Quantiles from the
Chi-square distribution are obtained in **R** using the function `qchisq()`.

alpha = .05 n = 22 s = 1150 Cl = qchisq(alpha/2,n-1) Cu = qchisq(1-alpha/2,n-1) conf.int = (n-1)*s^2/c(Cu,Cl) conf.int [1] 782789.7 2700843.7We can convert this to a confidence interval for by taking square roots of the interval. This gives

sqrt(conf.int) [1] 884.7541 1643.4244

2018-10-18