next up previous
Next: Homework and Project Assignments Up: Class Notes Previous: Polynomial Regression

Comparison of more than two groups

In regression we are dealing with a situation in which the response variable is quantitative and the independent variable also is quantitative. If the independent variable is categorical then we must use a different approach referred to as analysis of variance (AOV). The initial approach is to compare the response across different subpopulations identified by the categorical variable. For example, the data set in
represents the results of an experiment to compare effectiveness of different treatments for anorexia patients. These patients were randomly assigned to one of 3 treatments: Cont (control), CBT, and FT. Each patient was weighed at the beginning of the treatment period, participated in the assigned treatment program, and then was weighed again at the conclusion of the study. The goal of treatment for anorexia is to increase a patient's weight.

The basic research question of interest here is to determine what differences, if any, exist among these treatments. Initially this question will be considered by comparing the mean response among treatments, and a model that represents this can be expressed as:

Y_{ij} = \mu_i + \epsilon_{ij},

where $Y_{ij}$ represents the increase in weight (Postwt - Prewt) of the j-th patient in treatment group i, $\mu_i$ represents the mean increase for treatment group i, and $\epsilon_{ij}$, referred to as the error term, is the deviation of this patient from the group mean. This is the means model representation of this problem. The standard statistical assumption for this model is that the errors are independent, identically normally distributed with mean 0 and common variance $\sigma^2$. The assumption of identical variances within the groups is called homogeneity of variance. The basic research question then can be expressed as a test of hypotheses in which the null hypothesis is that all means are the same versus the alternative that some means differ.

This model can be reformulated to enable use of regression algorithms for the analysis. This is done by changing to an effects model in which

\mu_i = \mu + \alpha_i.

This model is over-specified, that is, there is one more paramter than groups, so we must add a constraint on the parameters. The default constraint used in R requires that $\alpha_1=0$, but other constraints are sometimes used. The null hypothesis for this parameterization is that all of the alphas equal 0. In the default case, $\mu$ represents the mean of the first group and $\alpha_i$ represents the difference between the mean of the first group and the mean of group i. The type of constraint used has no impact on how the hypotheses are tested, just on how the parameters are interpreted.

The process used to test these hypotheses can be summarized as follows.

  1. Test for homogeneity of variance using Levene's test:
    where Y is the name of the response variable and Group is the name of the grouping variable. If this test fails to reject, then there is not strong evidence against homogeneity of variance and so we proceed under this assumption.

  2. If variances are rasonably homogeneous, then fit an AOV model and check residual plots.
    Y.aov = aov(Y ~ Group)
    If normality assumption is reasonable, then perform overall F-test of equality of means. The default summary function returns standard analysis of variance table. Parameter estimates can be obtained using summary.lm. The (Intercept) term in this summary refers to the mean of the first level of the grouping variable. The other terms represent deviations of the corresponding group means from the first group mean.
    If the overall F-test is significant, then pairwise comparisons of group means can be obtained with the pairwise.t.test function. Assuming reasonably homogeneous variances, then we can use the pooled s.d. The overall F-test controls experiment-wise error, so p-values don't need to be adjusted.
    pairwise.t.test(Y ~ Group, p.adjust.method="none")

  3. If Levene's test rejects, then homogeneity of variance is not a reasonable assumption. In that case we can make pairwise comparisons of group means using two-sample t-tests. However, since we no longer have the overall F-test available to control experiment-wise error then we must adjust p-values of the individual two-sample t-tests. In most situations adjustment method holm gives the best results.
    pairwise.t.test(Y ~ Group,, p.adjust.method="holm")

An example of AOV is given in the following script:

next up previous
Next: Homework and Project Assignments Up: Class Notes Previous: Polynomial Regression