Note: this assignment will not be collected for grading. Instead, we will complete it in class. However, it will be most beneficial if students attempt to complete this assignment before
we discuss it in class.
Instructions. Graphics should be imported into your document, not submitted separately. Plots should be informative with titles, appropriate axis labels, and legends if needed.
Use data contained in the file
This data gives information related to smoking and cancer rates by state for 2010. The first column of this file gives state abbreviations and should be used as row names using the row.names=1 argument to read.table()
1. Create a new data frame that contains the following variables:
CigSalesRate = FY2010 Sales per 100,000 population,
CigPrice, CigYouthRate, CigAdultRate, LungCancerRate, Region.
2. Construct plots that show how lung cancer rates depend on each of the variables: CigSalesRate, CigPrice, CigYouthRate, CigAdultRate. Put these plots on the same graphic page using the graphical parameter mfrow. Use filled circles for the plotting characters and color these points black for all states except Texas which should be red.
3. Construct plots that show how CigSalesRate, CigYouthRate, CigAdultRate, and LungCancerRate depend on Region. Put these plots on the same graphic page.
4. Find the means and standard deviations for CigSalesRate, CigYouthRate, CigAdultRate, and LungCancerRate. Put them into a table that includes variable names in addition to the means and s.d.'s.
5. Which states are more than 2 sd's above the mean for CigSalesRate?
6. Which states are in the lowest 10% of LungCancerRate? (see documentation for R function quantile()) Which state has the highest LungCancerRate?
7. What are the percentile rankings of Texas for the variables CigSalesRate, CigPrice, CigYouthRate, CigAdultRate, LungCancerRate?