1. Data for this problem is in the file

www.utdallas.edu/~ammann/stat6341scripts/Tax.csv

This data contains property tax amounts for a sample of houses along with related physical
attributes of the houses. The problem is to understand how taxes are determined from the other
variables.

a) Fit a regression model to predict *Taxes* based on the other variables in this dataset,
check assumptions, and make any transformations if needed. Summarize the model and include
diagnostic plots to show that assumptions have been verified.

b) Use BIC to reduce the model to just important predictor varibles and provide a summary of the
reduced model.

c) Identify any observations that have studentized residuals in the reduced model with absolute
value greater than 2. For those observations compare the actual tax to 95% prediction intervals
for their taxes and interpret.

d) Let p denote the number of parameters in the reduced model (including the intercept) and let
n denote the number of observations. We consider an observation to be influential if

Are any of the observations in c) influential by this definition? Note that such observations would have high residuals and high influence. Remove those observations, refit the model, and then reduce this model using BIC. How does this model differ from the model in part b?

e) Use the model from part d) to obtain 95% prediction intervals for the taxes of the observations that were removed. How do these prediction intervals compare to the ones obtained in part c? How do the actual taxes of the removed observations compare to the new prediction intervals for their taxes?

2. Use data in

http://www.utdallas.edu/~ammann/stat6341scripts/Temperature1.data

This file contains average January minimum temperatures in degrees F. from 1931-1960 for 51 U.S.
cities. Pacific coast cities Los Angeles, SanFrancisco, Portland, and Seattle were removed since
their winter temperatures are controlled mainly by Pacific ocean currents.

a) Construct an informative plot of temperature versus latitude.

b) Fit a model to predict January minimum temperature based on latitude and longitude. Interpret the
coefficients of this model.

c) Are the model assumptions reasonable?

d) Is longitude an important predictor? Use 5% level of significance. If it is not significant,
refit the model with just latitude.

e) The latitude of Richardson is 33.0 with a longitude of 96.75. Use your regression model
in d) to predict the January minimum temperature for Richardson and obtain a 90% prediction
interval for this temperature. Richardson's actual January minimum temperature is 34. How does that
compare to temperatures in the prediction interval?

f) How does Richardson's actual January minimum temperature compare to a 90% confidence
interval for the **mean** temperature of all cities at the same latitude?

3. The file

http://www.utdallas.edu/~ammann/stat6341scripts/OgleSMCV.csv

contains stellar magnitudes (luminosity) and log(period) for a family of variable stars called
Cepheid variables in the Small Magellenic Cloud. The first column of this file gives IDs for the
Cepheids and so can be used as row names. These variable stars are important to
astronomers because the periods of their variability (*logPeriod*) are directly related to
their luminosity. This enables astronomers to estimate distances of these stars from their periods.
Two types of Cepheids are contained in this data set, FU and FO, and these types have slightly
different period-luminosity relationships. **Note**: stellar magnitudes are reversed in the
sense that higher value for magnitude corresponds to a dimmer star. Also, *BV = B-V* and
*VI = V-I* so those variables should be ignored.

a) Fit a model to predict *MV* based on *I,V,B,logPeriod,Type* that includes all
two-way interactions between *Type* and the other variables. Summarize this model and
include diagnostic plots to check assumptions.

b) Define as high-residual outliers stars with studentized residuals great than 3 in absolute
value, and define as high-leverage outliers stars with

where

c) High residual stars and high leverage stars may have been misclassified as FO or FU by the automated photometry software used by this study. For each of those stars use the reduced model to obtain predicted MV based on their values for

2017-12-10