Due date: Oct. 26, 2017
STATE: state CIG: cigarette consumption BLAD: bladder cancer LUNG: lung cancer KID: kidney cancer LEUK: leukemiaNote that STATE should be used as row names, not as a variable. However, there is a built-in data set in R, state.region, that categorizes states into four regions, Northeast, South, North Central, West. Use this variable as an additional factor. Since the Smoking data includes DC but state.region does not, assign the region for DC to be South since both Maryland and Virginia are included in that region. This can be done by a lookup table. Note that state.region is a factor, so to add an entry for DC, we first must convert state.region to an ordinary character vector and then combine that vector with the region for DC. Then this new vector must be converted to a factor when it is added to the Smoking data frame.
Region = c(as.vector(state.region),"South") names(Region) = c(state.abb,"DC") Smoking$Region = factor(Region[dimnames(Smoking)[]])[a] Fit models to predict bladder cancer rate based on cigarette consumption and Region. Consider three models: CIG only, CIG+Region, CIG*Region. Use 5% level of significance for partial-F tests to determine which model to use.
Y0.lm = lm(Y ~ 1, data=X) Yall.lm = lm(Y ~ ., data=X) Ystep.lm = step(Y0.lm,direction="forward",scope=list(lower=Y0.lm, upper=Yall.lm))The default value for argument k in this function is 2 which corresponds to AIC selection criterion. To use BIC, the argument k=log(n) must be included. Note that the argument data=X must be used for the intercept-only model even though the predictor variables are not used in that model. Both models in the scope argument must use the same data frame.