Suppose we have a very large population that contains two types of individuals, for example,
Males-Females,
Pass-Fail,
Pays Income Tax - Does Not Pay Income Tax,
Vote for Obama - Vote for Romney
We will label one of these types Success and the other Failure. These labels are arbitrary
and are used so we can talk about this problem in general. We can use the Binomial model
to represent the number of successes when
individuals are selected at random from the population.
This model will be appropriate if the population size is large compared to the sample size.
Let
denote the number of Successes in a random sample of size
and let
denote the
proportion of Successes in the population. Note that in this case, the proportion of
Failures in the population is
and the number of failures in the sample is
.
Probability theory shows that
R has functions to simulate many probability models. These functions begin with one of the
letters d,p,q,r followed by R's name for the model, binom in this case.
dbinom(x,size,prob) gives the probability function,
, for a random sample of size
size and success probability prob; pbinom(q,size,prob) gives
, qbinom(p,size,prob) gives quantiles, and rbinom(r,size,prob) gives a
random sample of size
from the population. Examples:
n=100 p=.2 x=seq(0,n) db = dbinom(x,n,p) plot(x,db,type="l",main="Binomial Probability Function") pb = pbinom(x,n,p) plot(x,pb,type="l",main="Binomial Cumulative Probability Function") # simulate 1000 random samples of size n=100 from population with p=.2 nrep = 1000 rb = rbinom(nrep,n,p) # each element of rb represents number of successes in a random sample of size 100 hist(rb,col="cyan") #show histogram # this looks bell-shaped, so show qqnorm qqnorm(rb) abline(c(mean(rb),sd(rb)),col="red") # mean and s.d. of sample proportions print(mean(rb)) print(sd(rb))
Central Limit Theorem: if the sample size
is large, then the histogram of all possible
samples of size
is approximately a normal distribution (bell-curve) with mean
and s.d.
. This is illustrated by the last lines of the above example. Note that in that example,
and
.