**R** includes a builtin data set called HairEyeColor that is a 3-dimensional frequency table
generated from a group of 592 students. Here are some examples of statements that can be made
using this data.
**1**. The percentage of the group that are male.
**2**. The percentage of the group that have green eyes.
**3**. The percentage of males that have green eyes.
**4**. The percentage of those with green eyes who are male.
**5**. The percentage of those with brown eyes who are female.

Note that the reference group for the first two statements is the entire population of 592 students,
so these statements are equivalent to ordinary probabilities:
**1**. *P(Male)*
**2**. *P(Green Eyes)*

However, the reference groups for the remaining statements are not the entire population but
instead are subgroups. This makes those statements equivalent to conditional probabilities.
**3**. *P(Green Eyes | Male)*
**4**. *P(Male | Green Eyes)*
**5**. *P(Female | Brown Eyes)*.

These probabilities and conditional probabilities can be obtained as follows.

nAll = sum(HairEyeColor) nMale = sum(HairEyeColor[,,"Male"]) pMale = nMale/nAll cat(paste("1. P(Male) =", round(pMale,3)),"\n") nGreenEyes = sum(HairEyeColor[,"Green",]) pGreenEyes = nGreenEyes/nAll cat(paste("2. P(Green Eyes) =", round(pGreenEyes,3)),"\n") nMaleGreenEyes = sum(HairEyeColor[,"Green","Male"]) cat(paste("3. P(Green Eyes | Male) =", round(nMaleGreenEyes/nMale,3)),"\n") cat(paste("4. P(Male | Green Eyes) =", round(nMaleGreenEyes/nGreenEyes,3)),"\n") nBrownEyes = sum(HairEyeColor[,"Brown",]) nFemaleBrownEyes = sum(HairEyeColor[,"Brown","Female"]) cat(paste("5. P(Female | Brown Eyes) =", round(nFemaleBrownEyes/nBrownEyes,3)),"\n")

Next let's consider whether or not hair and eye color are related. First we will answer this for males and females combined. Here is the corresponding frequency table.

cat("Hair and eye color frequency table for all students\n") HairEyeAll = apply(HairEyeColor,1:2,sum) print(HairEyeAll)The expected frequencies under the assumption of independence are obtained by:

R = apply(HairEyeAll,1,sum) C = apply(HairEyeAll,2,sum) Eall = outer(R,C)/nAll cat("Expected frequencies under independence\n") print(riound(Eall,1))Distance from independence is given by

Dall = ((HairEyeAll - Eall)^2)/Eall cat(paste("Total distance from independence =",round(sum(Dall),3)),"\n") cat("Individual distances are given by:\n") print(round(Dall,3))Let's repeat this just for males.

cat("Hair and eye color frequency table for males\n") HairEyeMale = HairEyeColor[,,"Male"] print(HairEyeMale) R = apply(HairEyeMale,1,sum) C = apply(HairEyeMale,2,sum) Emale = outer(R,C)/nMale cat("Expected frequencies under independence for males\n") print(round(Emale,1)) Dmale = ((HairEyeMale - Emale)^2)/Emale cat(paste("Total distance from independence for males =",round(sum(Dmale),3)),"\n") cat("Individual distances for males are given by:\n") print(round(Dmale,3))Now repeat just for females.

cat("Hair and eye color frequency table for females\n") HairEyeFemale = HairEyeColor[,,"Female"] print(HairEyeFemale) nFemale = sum(HairEyeFemale) R = apply(HairEyeFemale,1,sum) C = apply(HairEyeFemale,2,sum) Efemale = outer(R,C)/nFemale cat("Expected frequencies under independence for females\n") print(round(Efemale,1)) Dfemale = ((HairEyeFemale - Efemale)^2)/Efemale cat(paste("Total distance from independence for females =",round(sum(Dfemale),3)),"\n") cat("Individual distances for females are given by:\n") print(round(Dfemale,3))Note that the distances from independence of Blond hair are much higher for females than for males.

2018-10-18