This example considers how we could illustrate the performance of a confidence interval to estimate a population parameter. First consider large sample confidence intervals for a population proportion.

Simulation goals:

1. Illustrate how the central limit theorem is applicable.
2. Demonstrate that it is the interval that is random, not the parameter. That is, the interval
is what changes from sample to sample.

3. Demonstrate how the widths of confidence intervals depend on sample size.

4. Show that the probability the interval contains the population proportion *p* does not
depend on *p*.

5. Construct scripts for this demonstration that could be adapted easily to other types of
confidence intervals.

To satisfy goal 1, we can simulate *N* random samples from *Binomial(n,p)*, construct
confidence intervals for each sample, and determine whether or not each interval contains
*p*. Parameters for this sinmulation would include

Visualizations for this demonstration include:

1. Display the histogram and quantile-quantile plot for sample proportions of the simulated
samples.

2. Display the intervals in a way that shows how they vary from sample to sample and include an
indicator for which intervals contain *p* and which do not. This can be accomplished by
plotting vertical bars representing the intervals across the x-axis. Different colors for these
bars can be used to differentiate between intervals that contain *p* and those that do not.

The script to accomplish these goals should begin with specification of the simulation parameters
so that they can be modified easily. This is followed by obtaining the simulated samples,
constructing the confidence intervals, and then determining which intervals contain *p*.

### parameters ### N = 200 #number of simulated samples n = 400 #sample size p = .36 #population proportion alpha = .05 #for 95% confidence interval conf.int.colors = c("cyan","red") #red-green is not a good choice in case a viewer is color-blind ### simulate samples ### Xall = rbinom(N,n,p)/n # sample proportions ### construct intervals and store them in a matrix with 2 columns zval = qnorm(1-alpha/2) sig = sqrt(p*(1-p)/n) sighat = sqrt(Xall*(1-Xall)/n) conf.mat = matrix(0,N,2) conf.mat[,1] = Xall - zval*sighat conf.mat[,2] = Xall + zval*sighat ### indicate which do not contain p ci.out = conf.mat[,1] > p | conf.mat[,2] < p cnt.out = sum(ci.out) #number of intervals that missed p conf.col = rep(conf.int.colors[1],N) conf.col[ci.out] = conf.int.colors[2] ### show histogram and QQ-plot hist(Xall,col="cyan",main="") title(paste("Histogram of Sample Proportions\nfrom",N,"Random Samples")) title(sub=paste("Sample size =",n)) ### plot confidence intervals ### ### add each bar sequentially for in-class display Y.lim = range(conf.mat) plot(0,0,xlim=c(0,N),ylim=Y.lim,xlab="",ylab="",type="n") title(paste("Simulation of",paste(round(100*(1-alpha)),"%",sep=""), "Confidence Intervals for", paste("Binom(",p,")",sep=""))) mtext(paste("Sample size =",n),line=.25) for(k in seq(N)) { rect(k-1,conf.mat[k,1],k,conf.mat[k,2],col=conf.col[k]) } mtext(paste(paste(round(100*cnt.out/N,1),"%",sep=""), "of these confidence intervals do not contain",p),side=1,line=3) abline(h=p) ### repeat confidence interval plot using n=1000 ### keep same y-limits n = 1000 #sample size ### simulate samples ### Xall = rbinom(N,n,p)/n # sample proportions ### construct intervals and store them in a matrix with 2 columns zval = qnorm(1-alpha/2) sig = sqrt(p*(1-p)/n) sighat = sqrt(Xall*(1-Xall)/n) conf.mat = matrix(0,N,2) conf.mat[,1] = Xall - zval*sighat conf.mat[,2] = Xall + zval*sighat ### indicate which do not contain p ci.out = conf.mat[,1] > p | conf.mat[,2] < p cnt.out = sum(ci.out) #number of intervals that missed p conf.col = rep(conf.int.colors[1],N) conf.col[ci.out] = conf.int.colors[2] ### plot confidence intervals ### ### add each bar sequentially for in-class display plot(0,0,xlim=c(0,N),ylim=Y.lim,xlab="",ylab="",type="n") title(paste("Simulation of",paste(round(100*(1-alpha)),"%",sep=""), "Confidence Intervals for",paste("Binom(",p,")",sep=""))) mtext(paste("Sample size =",n),line=.25) for(k in seq(N)) { rect(k-1,conf.mat[k,1],k,conf.mat[k,2],col=conf.col[k]) } mtext(paste(paste(round(100*cnt.out/N,1),"%",sep=""), "of these confidence intervals do not contain",p),side=1,line=3) abline(h=p)

Note that we have used the same code twice, so let's define a function that returns the matrix
of confidence limits. Arguments for this function should be parameters of the simulation. Then
define a function that plots the intervals. Include arguments for specifying whether or not we
want to include histogram and qqnorm plots. These functions are defined in the script:

http://www.utdallas.edu/~ammann/stat6341scripts/simConfInt.r

Examples of its use are in

http://www.utdallas.edu/~ammann/stat6341scripts/confsim.r

Now suppose we wish to extend these functions to include confidence intervals for a mean.

http://www.utdallas.edu/~ammann/stat6341scripts/simConfInt1.r

Examples are in

http://www.utdallas.edu/~ammann/stat6341scripts/confsim1.r

Further extend to include confidence intervals for a s.d.

http://www.utdallas.edu/~ammann/stat6341scripts/simConfInt2.r

Examples are in

http://www.utdallas.edu/~ammann/stat6341scripts/confsim2.r

Additional examples that simulate heavy-tailed and asymmetric distributions are in

http://www.utdallas.edu/~ammann/stat6341scripts/confsim3.r

2017-11-01