Next: Other Estimation Problems Up: Class Notes Previous: Simulation of confidence intervals

## Estimation of a Population Variance

In the previous section we used the sample standard deviation as an estimate of the population standard deviation, so it is natural to consider how good is this estimate. Note: this section is only applicable if the population is approximately normally distributed or the sample size is large.

Suppose we have a population of measurements with mean and s.d. and we have randomly selected a sample of size n from this population. To determine how to construct confidence intervals for we can use a similar thought experiment to what we considered for the estimation of a population proportion. Suppose we could obtain every possible sample of size n from this population and computed the sample s.d. for each of these samples. The experiment in which we randomly select a single sample of size n and compute the sample s.d. of this sample would be equivalent to randomly selecting a single sample s.d. from the population of sample s.d.'s from all possible samples of size n. Therefore, probability statements about the sample s.d. could be derived from the distribution of all possible sample s.d.'s in a way that is similar to how we constructed confidence intervals for a population proportion and population mean. In this case, however, statistical theory doesn't answer that directly for the s.d. Instead, it tells us that if the population distribution is approximately a normal curve or if the sample size is large, then the distribution of

has a distribution that is referred to as a Chi-square distribution with n-1 degrees of freedom.

Since the chi-square distribution is not symmetric, we need to find upper and lower values from this distribution such that the area between them is the required level of confidence for the confidence interval. Let Clower denote the value from the chi-square distribution with n-1 degrees of freedom such that the area below it is and let Cupper denote the value from the chi-square distribution such that the area above it is . Then we can make the following probability statement,

The confidence interval for is derived by manipulating the inequality in this probability so that is between the inequalities. We can do this as follows:

and

Combine these inequalities to obtain

Therefore a confidence interval for is

Suppose for example that we wish to construct a 95% confidence interval for the population variance of the difference between list and sales price based on the sample of size 22 in the example above. In this case we would use a chi-square distribution with 21 degrees of freedom. Quantiles from the Chi-square distribution are obtained in R using the function qchisq().

alpha = .05
n = 22
s = 1150
Cl = qchisq(alpha/2,n-1)
Cu = qchisq(1-alpha/2,n-1)
conf.int = (n-1)*s^2/c(Cu,Cl)
conf.int
[1]  782789.7 2700843.7

We can convert this to a confidence interval for by taking square roots of the interval. This gives
sqrt(conf.int)
[1]  884.7541 1643.4244


Next: Other Estimation Problems Up: Class Notes Previous: Simulation of confidence intervals
ammann
2018-10-18