next up previous
Up: Example R scripts Previous: Example R scripts

Crabs data

Some of the graphical tools available in R are illustrated in the script file
http://www.utdallas.edu/~ammann/stat3355scripts/carbsGraph.r
Functions used by this script are described below.

read.table(). If the data set for a project is not small, it is most convenient to enter the data into R from a tabular data file in which each row corresponds to an individual and columns contain various measurements associated with each individual. These files must be plain text (not created by a document processor such as Word). If the data comes from a database or spreadsheet, the simplest way to have R read the data is to have the database or spreadsheet export the data into a comma-separated values file (csv). An example is given by the file
http://www.utdallas.edu/~ammann/stat3355scripts/crabs.csv

a.
The first argument is the name of the data file. This must be a string that contains the full path to the file if it is not in the startup directory, or it may be an internet address if the file is on a remote server.
b.
The first row of the crabs.csv file contains names for the columns. This row is referred to as a header and requires use of the
header=TRUE
argument.
c.
The values in each row are separated by a comma. The default separator is white space, so the argument
sep=","
is needed for the crabs data file.
read.table() will return an error message if it finds that the rows don't all contain the same number of values. This can occur, for example, if a csv file was created from an Excel file that had some extraneous blank cells. Otherwise, read.table() returns a data frame that is assigned to the name Crabs.

Note that the first two columns, named Species and Gender, respectively, are strings, not numeric values. In such cases, read.table() assumes these are categorical variables and then converts each of them automatically to a factor. The unique values of a factor are referred to as its levels. The levels of Species are B,O (for blue and orange), and the levels of Gender are M,F.

A particular column of a data frame can be accessed by name of the data frame followed by a dollar sign followed by the name of the column. So, for example,

Crabs$FL
refers to the column with that name. You could obtain a histogram of that column by
hist(Crabs$FL)


next up previous
Up: Example R scripts Previous: Example R scripts
Larry Ammann
2015-08-30