In regression we are dealing with a situation in which the response variable is quantitative and
the independent variable also is quantitative. If the independent variable is categorical then we
must use a different approach referred to as analysis of variance (AOV). The initial approach is
to compare the response across different subpopulations identified by the categorical variable.
For example, the data set in

http://www.utdallas.edu/~ammann/stat3355scripts/anorexia.txt

represents the results of an experiment to compare effectiveness of different treatments for
anorexia patients. These patients were randomly assigned to one of 3 treatments: *Cont*
(control), *CBT*, and *FT*. Each patient was weighed at the beginning of the treatment
period, participated in the assigned treatment program, and then was weighed again at the conclusion
of the study. The goal of treatment for anorexia is to increase a patient's weight.

The basic research question of interest here is to determine what differences, if any, exist among
these treatments. Initially this question will be considered by comparing the mean response among
treatments, and a model that represents this can be expressed as:

where represents the increase in weight (

This model can be reformulated to enable use of regression algorithms for the analysis. This is
done by changing to an effects model in which

This model is over-specified, that is, there is one more paramter than groups, so we must add a constraint on the parameters. The default constraint used in

The process used to test these hypotheses can be summarized as follows.

- Test for homogeneity of variance using Levene's test:
library(lawstat) levene.test(Y,Group)

where*Y*is the name of the response variable and*Group*is the name of the grouping variable. If this test fails to reject, then there is not strong evidence against homogeneity of variance and so we proceed under this assumption. - If variances are rasonably homogeneous, then fit an AOV model and check residual plots.
Y.aov = aov(Y ~ Group) plot(Y.aov,which=1:2)

If normality assumption is reasonable, then perform overall F-test of equality of means. The default*summary*function returns standard analysis of variance table. Parameter estimates can be obtained using*summary.lm*. The*(Intercept)*term in this summary refers to the mean of the first level of the grouping variable. The other terms represent deviations of the corresponding group means from the first group mean.summary(Y.aov) summary.lm(Y.aov)

If the overall F-test is significant, then pairwise comparisons of group means can be obtained with the*pairwise.t.test*function. Assuming reasonably homogeneous variances, then we can use the pooled s.d. The overall F-test controls experiment-wise error, so p-values don't need to be adjusted.pairwise.t.test(Y ~ Group, p.adjust.method="none")

- If Levene's test rejects, then homogeneity of variance is not a reasonable assumption. In
that case we can make pairwise comparisons of group means using two-sample t-tests.
pairwise.t.test(Y ~ Group, pool.sd=FALSE, p.adjust.method="holm")

An example of AOV is given in the following script: http://www.utdallas.edu/~ammann/stat3355scripts/anorexia.r

2015-05-01