1. Data for this problem is in the file
This data contains property tax amounts for a sample of houses along with related physical attributes of the houses. The problem is to understand how taxes are determined from the other variables.
a) Fit a regression model to predict Taxes based on the other variables in this dataset, check assumptions, and make any transformations if needed. Summarize the model and include diagnostic plots to show that assumptions have been verified.
b) Use BIC to reduce the model to just important predictor varibles and provide a summary of the reduced model.
c) Identify any observations that have studentized residuals in the reduced model with absolute value greater than 2. For those observations compare the actual tax to 95% prediction intervals for their taxes and interpret.
d) Let p denote the number of parameters in the reduced model (including the intercept) and let n denote the number of observations. We consider an observation to be influential if
2. Use data in
This file contains average January minimum temperatures in degrees F. from 1931-1960 for 51 U.S. cities. Pacific coast cities Los Angeles, SanFrancisco, Portland, and Seattle were removed since their winter temperatures are controlled mainly by Pacific ocean currents.
a) Construct an informative plot of temperature versus latitude.
b) Fit a model to predict January minimum temperature based on latitude and longitude. Interpret the coefficients of this model.
c) Are the model assumptions reasonable?
d) Is longitude an important predictor? Use 5% level of significance. If it is not significant, refit the model with just latitude.
e) The latitude of Richardson is 33.0 with a longitude of 96.75. Use your regression model in d) to predict the January minimum temperature for Richardson and obtain a 90% prediction interval for this temperature. Richardson's actual January minimum temperature is 34. How does that compare to temperatures in the prediction interval?
f) How does Richardson's actual January minimum temperature compare to a 90% confidence interval for the mean temperature of all cities at the same latitude?
3. The file
contains stellar magnitudes (luminosity) and log(period) for a family of variable stars called Cepheid variables in the Small Magellenic Cloud. The first column of this file gives IDs for the Cepheids and so can be used as row names. These variable stars are important to astronomers because the periods of their variability (logPeriod) are directly related to their luminosity. This enables astronomers to estimate distances of these stars from their periods. Two types of Cepheids are contained in this data set, FU and FO, and these types have slightly different period-luminosity relationships. Note: stellar magnitudes are reversed in the sense that higher value for magnitude corresponds to a dimmer star. Also, BV = B-V and VI = V-I so those variables should be ignored.
a) Fit a model to predict MV based on I,V,B,logPeriod,Type that includes all two-way interactions between Type and the other variables. Summarize this model and include diagnostic plots to check assumptions.
b) Define as high-residual outliers stars with studentized residuals great than 3 in absolute value, and define as high-leverage outliers stars with