The tests described in the previous section involve a comparison between a population mean and a standard. More commonly occurring situations involve a comparison of the means in two populations. Suppose for example we would like to compare the mean salaries of female and male financial analysts. This comparison can be expressed as a test of the hypotheses

where represents the population mean salary of all male financial analysts and represents the population mean salary of all female financial analysts. This problem is stated as a two-sided hypothesis so that we can detect an increase as well as a decrease in female salaries compared to male salaries. We will assume that these populations have approximately normal distributions or that we have large sample sizes so that the central limit theorem can be applied. The simplest way to make this comparison is to separately select random samples from each group. This sampling method produces independent samples. Let denote the population mean and standard deviation of male salaries, let denote the population mean and standard deviation of female salaries, and let denote the sample sizes, sample means, and sample standard deviations for the respective samples. It would be reasonable to base our decision on , the difference between the sample means. To construct a test statistic based on this difference, we need to determine its sampling distribution. That is, we must find the distribution of from all possible samples of size for males and for females. Let

Statistical theory shows that if the populations are approximately normal or if the sample sizes are large, then the distribution of

has approximately a t-distribution with degrees of freedom given by

Under the assumption that the null hypothesis is true, then

has approximately a t-distribution with degrees of freedom . Strong evidence for this two-sided alternative would be sample means that are far apart. Therefore, the p-value is , where

This test is referred to as Welch's approximation to the two-sample t-test.

Care should be taken with one sided-alternatives, since only one direction indicates strong evidence for the alternative. If the hypotheses are

then strong evidence for the alternative hypothesis would be a value of that is a large

then strong evidence for the alternative hypothesis would be a value of that is a large

then let

The p-value is . If is larger than , then would be negative and so this p-value would be greater than 0.5 and we would not reject the null hypothesis. If the hypotheses are

then let

The p-value in this case is . If is larger than , then would be negative and so this p-value would be greater than 0.5 and we would not reject the null hypothesis.

The validity of this two-sample test depends on the assumption of normality of the population.
If the populations are not normally distributed and if the sample sizes are not sufficiently large
to compensate for this non-normality via the Central Limit Theorem, then the p-values obtained
as described above will not be valid. There is a non-parametric test called the
**Wilcoxon Mann-Whitney rank sum test** that can be used in place of the two-sample
t-test. Most statistical computer packages include this test as part of their set of two-sample
test methods, but this method will not be discussed here.

**Example**. Suppose we wish to test the hypotheses

based on a random sample of 25 male financial analysts and a random sample of 18 female financial analysts using a 5% level of significance. Suppose that the salaries in these samples give , , , . It will be easier to express the salaries in $1000 dollar units rather than dollars, so the data becomes , , , . Then

The degrees of freedom are

The degrees of freedom are rounded to 28 to obtain the p-value for this two-sided test.

2*pt(-2.257,28),which gives

where the degrees of freedom for the t-value is the same as for the test statistic, and the standard deviation is the denominator of the test statistic,

A 95% confidence interval for the difference between the mean salaries for males and females is

This confidence interval expressed in dollars is [$500,$10,500]. That is, we are 95% confident that the difference between the means is within this interval. Note that all of these values are positive, indicating that the mean for males is greater than the mean for females.

There are situations in which we may wish to compare the variances of two populations. In that case, the test statistic is the ratio of the sample variances, . Statistical theory implies that if the populations are approximately normally distributed or the sample sizes are large, then under the assumption the population variances are equal, the sampling distribution of this ratio is an F-distribution. This distribution has two parameters, degrees of freedom, given by . This implies that a test of the hypotheses,

can be constructed based on the ratio of sample variances. Since this test is inherently two-sided, in practice, we divide the larger sample variance by the smaller sample variance and the corresponding p-value is the area to the right of this ratio under the corresponding F-distribution. Note that we do not double this area to obtain the p-value for this test. For example, the data given above for the comparison of male and female financial analysts reported sample variances , based on sample sizes of

and the p-value is taken from the F-distribution with

pv = 1 - pf(1.5,17,24)which gives

2015-11-12