**Due date**: Feb. 16, 2016. Don't just give answers to these problems. Consider them as if
they were job assignments given to you by your supervisor and add some brief discussion and
interpretation of the results. Create a document that contains your answers and graphics and email
a pdf version to me with the subject line: Stat 3355 homework 1. If you use Word, please save a
copy as a pdf file. Don't send the original Word document.

- Use the data contained in the file

http://www.utdallas.edu/~ammann/stat3355scripts/TrackRecords.csv

This data represents the national record times for males in track races. The first column gives records for the 100 meter race, etc.- Find the means and standard deviations for each race.
- Which countries are more than 2 sd's above the mean for the 100 meter race? Which are more than 2 sd's above the mean for the Marathon?
- Which countries are in the lowest 10% of record times for the 100 meter race? Which are in
the highest 10% for the 1500 meter race? (See documentation for
**R**function*quantile()*) - Plot Marathon record times versus 800 meter record times and add an informative title. Find and interpret the correlation between these times. Obtain the least squares regression line to predict Marathon record times based on 800 meter times and superimpose this line on the plot. Add text below your main title that reports r-square for these variables.

- Use the data contained in the file

http://www.utdallas.edu/~ammann/stat3355scripts/Sleep.data

A description of this data is given in

http://www.utdallas.edu/~ammann/stat3355scripts/Sleep.txt

The*Species*column should be used as row names.- Construct histograms of each variable.
- The strong asymmetry for all variables except
*Sleep*indicates that a*log*transformation is appropriate for those variables. Construct a new data frame that contains*Sleep*, replaces*BodyWgt, BrainWgt, LifeSpan*by their log-transformed values, and then construct histograms of each variable in this new data frame. - Plot
*LifeSpan*vs*BodyWgt*with*LifeSpan*on the y-axis and include an informative title. Repeat but use the log-transformed variables instead. Superimpose lines corresponding to the respective means of the variables for each plot. - What proportion of species are within 2 s.d.'s of mean
*LifeSpan*? What proportion are with 2 s.d.'s of mean*BodyWgt*? Answer these for the original variables and for the log-transformed variables. - Obtain and interpret the correlation between
*LifeSpan*and*BodyWgt*. Repeat for*log(LifeSpan)*and*log(BodyWgt)*. - Obtain the least squares regression line to predict
*LifeSpan*based on*BodyWgt*. Repeat to predict*log(LifeSpan)*based on*log(BodyWgt)*. Predict*LifeSpan*of Homo sapiens based on each of these regression lines. Which would you expect to have the best overall accuracy? Which prediction is closest to the actual*LifeSpan*of Homo sapiens?

- Use the data contained in the file

http://www.utdallas.edu/~ammann/stat3355scripts/HappyPlanet1.csv

This data comes from the*Happy Planet Index*, http://www.happyplanetindex.org

Note that one of the countries is`Cote d'Ivoire`

which requires use of the`quote=`argument in`read.csv()`:quote="\""

- Obtain the quartiles of
*Footprint*. - Construct a histogram of
*Footprint*. Obtain the mean and s.d. of*Footprint*. How many countries are within 2 s.d.'s of the mean for*Footprint*? - Since
*Footprint*is heavily skewed, construct a new variable called*logFootprint*that is the natural logarithm of*Footprint*(base ). Answer the previous two items for this variable. Are the quartiles of*logFootprint*the same as the logarithm of the quartiles of*Footprint*? What about the mean? - Plot
*WellBeing*vs*logFootprint*, use different colors for different regions, include an informative title, and include a legend that indicates which color corresponds to which region. - Find the correlation between
*WellBeing, logFootprint*and interpret. Add text below the main title of the plot that reports r-square for these variables.

- Obtain the quartiles of

2016-02-04