The plot of Residuals versus Weight for the Cars data shows some curvature in the form of a
possible quadratic component in the relatiohship between Weight and Mileage, so let's consider
adding that to our model. The regression model now has the form,

with the same assumptions regarding as before. This can be accomplished by constructing a new variable,

Weight2 = Weight^2and adding it to the model formula.

Mileage1.lm = lm(Mileage ~ Weight, data=Cars) #residual plots for linear model plot(Mileage1.lm,which=1:2,pch=19) #residual plots for quadratic model Weight2 = Cars$Weight^2 Mileage2.lm = lm(Mileage ~ Weight + Weight2, data=Cars) plot(Mileage2.lm,which=1:2,pch=19)

Note that the residual plots now look very consistent with the standard assumptions. The next question to consider is whether or not the quadratic term in the model is statistically significant. This corresponds to testing the hypotheses,

with no restrictions on . This is accomplished using a partial F-test. This test has the general form:

where the full model refers to the quadratic model in this case and the reduced model is the linear model. The sampling distribution of this test statistic is F with numerator d.f.

which is equivalent to the difference in number of parameters for the two models, and with denominator d.f. given by

anova(Mileage1.lm,Mileage2.lm)This shows that the quadratic term is statistically significant with p-value = 0.001577. Since the residual plots show no remaining curvature, we can stop with this quadratic model. If we wanted to consider a cubic model, we can follow the same process. and adding it to the model formula.

Weight3 = Cars$Weight^3 Mileage3.lm = lm(Mileage ~ Weight + Weight2 + Weight3, data=Cars) anova(Mileage1.lm,Mileage2.lm,Mileage3.lm)

Now let's construct a plot similar to what we did for the linear model, but now we will use the quadratic model.

wgt.range = range(Cars$Weight) W1 = seq(wgt.range[1],wgt.range[2],length=250) W = data.frame(Weight=W1,Weight2=W1^2) W.conf = predict(Mileage2.lm,newdata=W,interval="confidence") W.pred = predict(Mileage2.lm,newdata=W,interval="prediction") Y.lim = range(W.pred) #needed to ensure plot will contain all of prediction curves pcol = c("red","DarkGreen","blue") plot(Mileage ~ Weight,data=Cars,pch=19,ylim=Y.lim) title("Mileage vs Weight") mtext("with 95% Prediction and Confidence Intervals for Quadratic Model",line=0.25) lines(W$Weight,W.conf[,"fit"],col=pcol[1]) lines(W$Weight,W.conf[,"lwr"],col=pcol[2]) lines(W$Weight,W.conf[,"upr"],col=pcol[2]) lines(W$Weight,W.pred[,"lwr"],col=pcol[3]) lines(W$Weight,W.pred[,"upr"],col=pcol[3]) legend(wgt.range[1],Y.lim[1],legend=c("Fit","Confidence","Prediction"), lty=1,col=pcol,yjust=0,cex=.8)

2016-04-26