This course is intended to introduce students to methods for visualizing information contained in
data sets and for making inferences from them. We will use the statistics language R for
this. R is open-source software, meaning it is free. It is developed and maintained by
a select group of research statisticians around the world and so represents state-of-the-art in
statistical computing. Self-installing executables are available at the main R website:
Follow the link corresponding to your operating system (Windows, Mac OS, or Linux) to download the self-installing executable.
R may seem intimidating at first because it is not point-and-click software like Excel, it is a statistical programming language. The payoff for students who learn how to use this software is that it adds considerable value to a résumé. Data scientists are in great demand today and for the foreseeable future. Many job descriptions in this field state that experience with R is very helpful.
One of the reasons R has become so widely accepted, in addition to its price, is that
it has extensive graphics capabilities. To get a taste of those, start R and enter
demo("graphics")into the command window. Basically, you can create whatever graphic you wish to visualize particular aspects of a data set, rather than having to choose from a limited list of graph types. This results in a steeper learning curve, but plenty of example scripts will be available for you to give you concrete examples. Other demos you can view include
demo("colors") demo("hclColors") demo("image") demo("persp")
Another major advantage of R is that commands can be entered into a plain text file for R to read and then perform the commands listed in that file. The reason that is helpful is that you can save those files for future use. If you need to do something similar to what you have already done and saved in a file, then you can just make a few changes to the file, if necessary, and apply that to your new data. That way you won't need to remember the specific sequence of clicks or commands that you used in the past, you simply need to view that file to see what you did in the past.
These assigned exercises do not need to be turned in for grading. However, the sooner you complete this assignment, the sooner you will be able to understand the scripts that are used for examples presented in class, and the sooner you will be able to begin work on the graded projects that will be assigned. This assignment refers to the introductory guide to R, Owen-TheRGuide.pdf. Additional information to supplement this guide is given in the Notes section below. Other chapters from this guide will be added as we progress through the course.
Class discussion item
Some web sites contain polls for visitors to express opinions about a selected issue or answer a question. For example,
usually has such a poll near the bottom of the page under the heading SPORTSNATION. The question that we will discuss in class on September 2 is: what can be learned from this information after collecting responses over a 24 hour period? In particular, can we infer anything about sports fans in general, or at least people who visit that site? To obtain some background on this question, read the section titled Presidential poll at