Many important statistical problems can be expressed as the problem of determining some characteristic of a population when it is not possible or feasible to measure every individual in the population. For example, political candidates may wish to determine the proportion of voters in a state who intend to vote for them; an advertising agency may wish to determine the proportion of a target population who react favorably to an ad campaign; a manufacturer may wish to determine the mean cost per unit associated with warranty costs of a product. Since it is not possible or feasible to contact every individual in the respective populations, the only reasonable alternative is to select in some way a sample from the population and use the information contained within the sample to estimate the population characteristic of interest.
At first thought, it would seem that what should be done here is to select a representative sample from the population, since such a sample would mirror the properties of the population. Suppose, for example, that we would like to determine the proportion of voters in a state who intend to vote for a particular candidate for governor. Let denote this proportion. A representative sample selected from this population should have a sample proportion that is close to . The problem though is how to select such a sample. In fact, it is not possible to do this, for even if the proportion in the sample were close to , we would not know it because we don't know the value of .
Furthermore, an estimate derived from a sample has no value unless we can make some statement about its accuracy. Suppose that is the proportion in the sample who favor that candidate. Then the error of prediction would be . Obviously we cannot make an exact statement about this error since we don't know . However, if the sample is selected randomly so that each individual in the population has the same chance of being selected, then it is possible to make a probability statement about the estimation error. Random sampling is the only type of sampling with which we can make reasonable statements about the prediction error.