next up previous
Next: Nonparametric tests Up: Statistical Decisions Previous: Hypothesis tests to compare

Chi-square test for independence in two-way frequency tables

Suppose a large corporation has 20 open positions for entry-level accountants and all applications are initially screened to identify those who satisfy the job requirements. Suppose there were 200 qualified applicants, 110 of which were male and 90 were female, and that of those hired, 16 were male and 4 were female. These results can be summarized in the following table:
  Hired Not Hired Total
M 16 94 110
F 4 86 90
Total 20 180 200

Note that 10% of all qualified applicants were hired, but 14.5% (16/110) of qualified male applicants were hired and 4.4% (4/90) of qualified female applicants were hired. So the chances of being hired appear to differ between males and females. We say that hiring and gender are independent if the probabilities of being hired for males and females are the same as the overall probability, 0.10. Therefore, to have exact independence between hiring and gender in this case, the company would needed to have hired 10% of qualified male applicants and 10% of qualified female applicants. The table of expected frequencies in this case would be
  Hired Not Hired Total
M 11 98 110
F 9 82 90
Total 20 180 200

We can construct a measure of distance between the actual frequencies and the expected frequencies under independence given by

\begin{displaymath}
D = \sum \frac{(O-E)^2}{E},
\end{displaymath}

where O represents the observed frequency, E represents the expected frequency under independence, and the sum is over the interior cells of the frequency table. In this example,

\begin{displaymath}
D = \frac{(16-11)^2}{11} + \frac{(94-98)^2}{98} + \frac{(4-9)^2}{9} + \frac{(86-82)^2}{82} = 5.409.
\end{displaymath}

The sampling distribution for this distance is approximately a chi-square distribution with degrees of freedom given by (r-1)(c-1) where r is the number of rows and c is the number of columns of the frequency table, not counting the margin totals. The p-value for a test of the hypotheses
$H_0:\ {\rm categories\ are\ independent}$
$H_0:\ {\rm categories\ are\ not\ independent}$
is the area to the right of D under the chi-square density with (r-1)(c-1) d.f. From R, the p-value is
1 - pchisq(5.409,1) = 0.020.
If we use 5% level of significance for this test, then we would reject the null hypothesis and conclude that hiring and gender are not independent.


next up previous
Next: Nonparametric tests Up: Statistical Decisions Previous: Hypothesis tests to compare
Larry Ammann
2014-10-21