In regression we are dealing with a situation in which the response variable is quantitative and
the independent variable also is quantitative. If the independent variable is categorical then we
must use a different approach referred to as analysis of variance (AOV). The initial approach is
to compare the response across different subpopulations identified by the categorical variable.
For example, the data set in
represents the results of an experiment to compare effectiveness of different treatments for anorexia patients. These patients were randomly assigned to one of 3 treatments: Cont (control), CBT, and FT. Each patient was weighed at the beginning of the treatment period, participated in the assigned treatment program, and then was weighed again at the conclusion of the study. The goal of treatment for anorexia is to increase a patient's weight.
The basic research question of interest here is to determine what differences, if any, exist among
these treatments. Initially this question will be considered by comparing the mean response among
treatments, and a model that represents this can be expressed as:
This model can be reformulated to enable use of regression algorithms for the analysis. This is
done by changing to an effects model in which
The process used to test these hypotheses can be summarized as follows.
library(lawstat) levene.test(Y,Group)where Y is the name of the response variable and Group is the name of the grouping variable. If this test fails to reject, then there is not strong evidence against homogeneity of variance and so we proceed under this assumption.
Y.aov = aov(Y ~ Group) plot(Y.aov,which=1:2)If normality assumption is reasonable, then perform overall F-test of equality of means. The default summary function returns standard analysis of variance table. Parameter estimates can be obtained using summary.lm. The (Intercept) term in this summary refers to the mean of the first level of the grouping variable. The other terms represent deviations of the corresponding group means from the first group mean.
summary(Y.aov) summary.lm(Y.aov)If the overall F-test is significant, then pairwise comparisons of group means can be obtained with the pairwise.t.test function. Assuming reasonably homogeneous variances, then we can use the pooled s.d. The overall F-test controls experiment-wise error, so p-values don't need to be adjusted.
pairwise.t.test(Y ~ Group, p.adjust.method="none")
pairwise.t.test(Y ~ Group, pool.sd=FALSE, p.adjust.method="holm")
An example of AOV is given in the following script: http://www.utdallas.edu/~ammann/stat3355scripts/anorexia.r