next up previous
Next: Estimation Up: Probability models and simulation Previous: Probability models and simulation

Simulation for Binomial Distributions

Suppose we have a very large population that contains two types of individuals, for example,
Pays Income Tax - Does Not Pay Income Tax,
Vote for Obama - Vote for Romney
We will label one of these types Success and the other Failure. These labels are arbitrary and are used so we can talk about this problem in general. We can use the Binomial model to represent the number of successes when $n$ individuals are selected at random from the population. This model will be appropriate if the population size is large compared to the sample size.

Let $N$ denote the number of Successes in a random sample of size $n$ and let $p$ denote the proportion of Successes in the population. Note that in this case, the proportion of Failures in the population is $1-p$ and the number of failures in the sample is $n-N$. Probability theory shows that

P(N=k) = {n\choose{k}} p^k(1-p)^{n-k},\ \ k=0,1,\dots,n.

This sentence can be interpreted as follows. Suppose we consider all possible samples of size $n$ that can be selected from this population. The proportion of such samples that contain exactly $k$ successes is given by

{n\choose{k}} p^k(1-p)^{n-k}.

R has functions to simulate many probability models. These functions begin with one of the letters d,p,q,r followed by R's name for the model, binom in this case. dbinom(x,size,prob) gives the probability function, $P(N=x)$, for a random sample of size size and success probability prob; pbinom(q,size,prob) gives $P(N\le q)$, qbinom(p,size,prob) gives quantiles, and rbinom(r,size,prob) gives a random sample of size $r$ from the population. Examples:

db = dbinom(x,n,p)
plot(x,db,type="l",main="Binomial Probability Function")
pb = pbinom(x,n,p)
plot(x,pb,type="l",main="Binomial Cumulative Probability Function")
# simulate 1000 random samples of size n=100 from population with p=.2
nrep = 1000
rb = rbinom(nrep,n,p)
# each element of rb represents number of successes in a random sample of size 100
hist(rb,col="cyan") #show histogram
# this looks bell-shaped, so show qqnorm
# mean and s.d. of sample proportions

Central Limit Theorem: if the sample size $n$ is large, then the histogram of all possible samples of size $n$ is approximately a normal distribution (bell-curve) with mean $np$ and s.d. $\sqrt{np(1-p)}$. This is illustrated by the last lines of the above example. Note that in that example, $np = 100(.2) = 20$ and $\sqrt{np(1-p)} = \sqrt{100(.2)(.8)} = \sqrt{16} = 4$.

next up previous
Next: Estimation Up: Probability models and simulation Previous: Probability models and simulation
Larry Ammann