next up previous
Next: Software for Statistical Analysis Up: Class Notes Previous: Inference for Regression

Polynomial Regression

The plot of Residuals versus Weight for the Cars data shows some curvature in the form of a possible quadratic component in the relatiohship between Weight and Mileage, so let's consider adding that to our model. The regression model now has the form,

\begin{displaymath}
Y = \beta_0 + \beta_1X + \beta_2X^2 + \epsilon
\end{displaymath}

with the same assumptions regarding $\epsilon$ as before. This can be accomplished by constructing a new variable,
Weight2 = Weight^2
and adding it to the model formula.
Mileage1.lm = lm(Mileage ~ Weight, data=Cars)
#residual plots for linear model
plot(Mileage1.lm,which=1:2,pch=19)
#residual plots for quadratic model
Weight2 = Cars$Weight^2
Mileage2.lm = lm(Mileage ~ Weight + Weight2, data=Cars)
plot(Mileage2.lm,which=1:2,pch=19)

Note that the residual plots now look very consistent with the standard assumptions. The next question to consider is whether or not the quadratic term in the model is statistically significant. This corresponds to testing the hypotheses,

\begin{eqnarray*}
H_0:\ \beta_2 &=& 0\\
H_A:\ \beta_2 &\ne& 0
\end{eqnarray*}

with no restrictions on $\beta_0, \beta_1$. This is accomplished using a partial F-test. This test has the general form:

\begin{displaymath}
\frac{\frac{SSE(reduced)-SSE(full)}{df(reduced)-df(full)}}{\frac{SSE(full)}{df(full)}},
\end{displaymath}

where the full model refers to the quadratic model in this case and the reduced model is the linear model. The sampling distribution of this test statistic is F with numerator d.f.

\begin{displaymath}
df(reduced)-df(full)
\end{displaymath}

which is equivalent to the difference in number of parameters for the two models, and with denominator d.f. given by df(full) = n-3 in this case. This test is performed in R using the anova() function,
anova(Mileage1.lm,Mileage2.lm)
This shows that the quadratic term is statistically significant with p-value = 0.001577. Since the residual plats show no remaining curvature, we can stop with this quadratic model. If we wanted to consider a cubic model, we can follow the same process. and adding it to the model formula.
Weight3 = Cars$Weight^3
Mileage3.lm = lm(Mileage ~ Weight + Weight2 + Weight3, data=Cars)
anova(Mileage1.lm,Mileage2.lm,Mileage3.lm)

Now let's construct a plot similar to what we did for the linear model, but now we will use the quadratic model.

wgt.range = range(Cars$Weight)
W1 = seq(wgt.range[1],wgt.range[2],length=250)
W = data.frame(Weight=W1,Weight2=W1^2)
W.conf = predict(Mileage2.lm,newdata=W,interval="confidence")
W.pred = predict(Mileage2.lm,newdata=W,interval="prediction")
Y.lim = range(W.pred) #needed to ensure plot will contain all of prediction curves
pcol = c("red","DarkGreen","blue")
plot(Mileage ~ Weight,data=Cars,pch=19,ylim=Y.lim)
title("Mileage vs Weight")
mtext("with 95% Prediction and Confidence Intervals for Quadratic Model",line=0.25)
lines(W$Weight,W.conf[,"fit"],col=pcol[1])
lines(W$Weight,W.conf[,"lwr"],col=pcol[2])
lines(W$Weight,W.conf[,"upr"],col=pcol[2])
lines(W$Weight,W.pred[,"lwr"],col=pcol[3])
lines(W$Weight,W.pred[,"upr"],col=pcol[3])
legend(wgt.range[1],Y.lim[1],legend=c("Fit","Confidence","Prediction"),
      lty=1,col=pcol,yjust=0,cex=.8)

Image Mileage3


next up previous
Next: Software for Statistical Analysis Up: Class Notes Previous: Inference for Regression
Larry Ammann
2014-12-08