Due date: Feb. 19, 2018
**Problem 1**. Use the data contained in the file

http://www.utdallas.edu/~ammann/stat3355scripts/TrackRecords.csv

This data represents the national record times for males in track races. The first column gives
records for the 100 meter race, etc.
**a)** Find the means, medians, and standard deviations for each race.
**b)** Which countries are more than 2 sd's below the mean for the 100 meter race? Which are
more than 2 sd's above the mean for the 400 meter race?
**c)** Which countries are in the lowest 10% of record times for the 800 meter race? Which
are in the highest 10% for the Marathon?
**d)** Plot 400 meter record times versus 100 meter record times and add an informative title.
Use filled circles colored red for USA and black for other countries. Find and interpret the
correlation between these times. Obtain the least squares regression line to predict 400 meter
record times based on 100 meter times and superimpose this line on the plot (see documentation for
the **R** function *abline()*). Add text below your main title that reports r-squared for these
variables.

**Problem 2**. Use the data contained in the file

http://www.utdallas.edu/~ammann/stat3355scripts/Sleep.data

A description of this data is given in

http://www.utdallas.edu/~ammann/stat3355scripts/Sleep.txt

The *Species* column should be used as row names.
**a)** Construct histograms of each variable and put them on the same graphics page.
**b** The strong asymmetry for all variables except *Sleep* indicates that a
*log* transformation is appropriate for those variables. Construct a new data frame that
contains *Sleep*, replaces *BodyWgt, BrainWgt, LifeSpan* by their log-transformed
values, and then construct histograms of each variable in this new data frame.
**c)** Plot *LifeSpan* vs *BrainWgt* with *LifeSpan* on the y-axis and
include an informative title. Repeat but use the log-transformed variables instead. Superimpose
lines corresponding to the respective means of the variables for each plot.
**d)** What proportion of species are within 2 s.d.'s of mean *LifeSpan*? What
proportion are with 2 s.d.'s of mean *BrainWgt*? Answer these for the original variables and
for the log-transformed variables.
**e)** Obtain and interpret the correlation between *LifeSpan* and *BrainWgt*.
Repeat for *log(LifeSpan)* and *log(BrainWgt)*.
**f)** Obtain the least squares regression line to predict *LifeSpan* based on
*BrainWgt*. Repeat to predict *log(LifeSpan)* based on *log(BrainWgt)*.
Predict *LifeSpan* of *Homo sapiens* based on each of these regression lines. Which
would you expect to have the best overall accuracy? Which prediction is closest to the actual
*LifeSpan* of *Homo sapiens*?

**Note**: if **X** is the name of a data frame in **R** that contains two variables, say
and you would like to create a new data frame with log-transformed values of the variables
in **X**, then you can create a new object, named for example **Xl**, that is assigned the
value **X** and then log-transform variables in this new data frame.

Kl = X names(X1) = paste("logx",1:2,sep="") X1$logx1 = log(X$x1) X1$logx2 = log(X$x2)

2018-02-14