Suppose we have a very large population that contains two types of individuals, for example,

Males-Females,

Pass-Fail,

Pays Income Tax - Does Not Pay Income Tax,

Vote for Obama - Vote for Romney

We will label one of these types *Success* and the other *Failure*. These labels are arbitrary
and are used so we can talk about this problem in general. We can use the **Binomial** model
to represent the number of successes when individuals are selected at random from the population.
This model will be appropriate if the population size is large compared to the sample size.

Let denote the number of *Successes* in a random sample of size and let denote the
proportion of *Successes* in the population. Note that in this case, the proportion of
*Failures* in the population is and the number of failures in the sample is .
Probability theory shows that

This sentence can be interpreted as follows. Suppose we consider all possible samples of size that can be selected from this population. The proportion of such samples that contain exactly successes is given by

**R** has functions to simulate many probability models. These functions begin with one of the
letters *d,p,q,r* followed by **R**'s name for the model, *binom* in this case.
*dbinom(x,size,prob)* gives the probability function, , for a random sample of size
*size* and success probability *prob*; *pbinom(q,size,prob)* gives
, *qbinom(p,size,prob)* gives quantiles, and *rbinom(r,size,prob)* gives a
random sample of size from the population. Examples:

n=100 p=.2 x=seq(0,n) db = dbinom(x,n,p) plot(x,db,type="l",main="Binomial Probability Function") pb = pbinom(x,n,p) plot(x,pb,type="l",main="Binomial Cumulative Probability Function") # simulate 1000 random samples of size n=100 from population with p=.2 nrep = 1000 rb = rbinom(nrep,n,p) # each element of rb represents number of successes in a random sample of size 100 hist(rb,col="cyan") #show histogram # this looks bell-shaped, so show qqnorm qqnorm(rb) abline(c(mean(rb),sd(rb)),col="red") # mean and s.d. of sample proportions print(mean(rb)) print(sd(rb))

**Central Limit Theorem**: if the sample size is large, then the histogram of all possible
samples of size is approximately a normal distribution (bell-curve) with mean and s.d.
. This is illustrated by the last lines of the above example. Note that in that example,
and
.

2013-12-17