Next: Estimation Up: Probability models and simulation Previous: Probability models and simulation

### Simulation for Binomial Distributions

Suppose we have a very large population that contains two types of individuals, for example,
Males-Females,
Pass-Fail,
Pays Income Tax - Does Not Pay Income Tax,
Vote for Obama - Vote for Romney
We will label one of these types Success and the other Failure. These labels are arbitrary and are used so we can talk about this problem in general. We can use the Binomial model to represent the number of successes when individuals are selected at random from the population. This model will be appropriate if the population size is large compared to the sample size.

Let denote the number of Successes in a random sample of size and let denote the proportion of Successes in the population. Note that in this case, the proportion of Failures in the population is and the number of failures in the sample is . Probability theory shows that

This sentence can be interpreted as follows. Suppose we consider all possible samples of size that can be selected from this population. The proportion of such samples that contain exactly successes is given by

R has functions to simulate many probability models. These functions begin with one of the letters d,p,q,r followed by R's name for the model, binom in this case. dbinom(x,size,prob) gives the probability function, , for a random sample of size size and success probability prob; pbinom(q,size,prob) gives , qbinom(p,size,prob) gives quantiles, and rbinom(r,size,prob) gives a random sample of size from the population. Examples:

n=100
p=.2
x=seq(0,n)
db = dbinom(x,n,p)
plot(x,db,type="l",main="Binomial Probability Function")
pb = pbinom(x,n,p)
plot(x,pb,type="l",main="Binomial Cumulative Probability Function")
# simulate 1000 random samples of size n=100 from population with p=.2
nrep = 1000
rb = rbinom(nrep,n,p)
# each element of rb represents number of successes in a random sample of size 100
hist(rb,col="cyan") #show histogram
# this looks bell-shaped, so show qqnorm
qqnorm(rb)
abline(c(mean(rb),sd(rb)),col="red")
# mean and s.d. of sample proportions
print(mean(rb))
print(sd(rb))


Central Limit Theorem: if the sample size is large, then the histogram of all possible samples of size is approximately a normal distribution (bell-curve) with mean and s.d. . This is illustrated by the last lines of the above example. Note that in that example, and .

Next: Estimation Up: Probability models and simulation Previous: Probability models and simulation
Larry Ammann
2013-12-17