next up previous
Next: Continuous Random Variables Up: Additional Properties of Probability Previous: Additional Properties of Probability

Example

R includes a builtin data set called HairEyeColor that is a 3-dimensional frequency table generated from a group of 592 students. Here are some examples of statements that can be made using this data.
1. The percentage of the group that are male.
2. The percentage of the group that have green eyes.
3. The percentage of males that have green eyes.
4. The percentage of those with green eyes who are male.
5. The percentage of those with brown eyes who are female.
Note that the reference group for the first two statements is the entire population of 592 students, so these statements are equivalent to ordinary probabilities:
1. P(Male)
2. P(Green Eyes)
However, the reference groups for the remaining statements are not the entire population but instead are subgroups. This makes those statements equivalent to conditional probabilities.
3. P(Green Eyes | Male)
4. P(Male | Green Eyes)
5. P(Female | Brown Eyes).
These probabilities and conditional probabilities can be obtained as follows.

nAll = sum(HairEyeColor)
nMale = sum(HairEyeColor[,,"Male"])
pMale = nMale/nAll
cat(paste("1. P(Male) =", round(pMale,3)),"\n")
nGreenEyes = sum(HairEyeColor[,"Green",])
pGreenEyes = nGreenEyes/nAll
cat(paste("2. P(Green Eyes) =", round(pGreenEyes,3)),"\n")
nMaleGreenEyes = sum(HairEyeColor[,"Green","Male"])
cat(paste("3. P(Green Eyes | Male) =", round(nMaleGreenEyes/nMale,3)),"\n")
cat(paste("4. P(Male | Green Eyes) =", round(nMaleGreenEyes/nGreenEyes,3)),"\n")
nBrownEyes = sum(HairEyeColor[,"Brown",])
nFemaleBrownEyes = sum(HairEyeColor[,"Brown","Female"])
cat(paste("5. P(Female | Brown Eyes) =", round(nFemaleBrownEyes/nBrownEyes,3)),"\n")

Next let's consider whether or not hair and eye color are related. First we will answer this for males and females combined. Here is the corresponding frequency table.

cat("Hair and eye color frequency table for all students\n")
HairEyeAll = apply(HairEyeColor,1:2,sum)
print(HairEyeAll)
The expected frequencies under the assumption of independence are obtained by:
R = apply(HairEyeAll,1,sum)
C = apply(HairEyeAll,2,sum)
Eall = outer(R,C)/nAll
cat("Expected frequencies under independence\n")
print(riound(Eall,1))
Distance from independence is given by
Dall = ((HairEyeAll - Eall)^2)/Eall
cat(paste("Total distance from independence =",round(sum(Dall),3)),"\n")
cat("Individual distances are given by:\n")
print(round(Dall,3))
Let's repeat this just for males.
cat("Hair and eye color frequency table for males\n")
HairEyeMale = HairEyeColor[,,"Male"]
print(HairEyeMale)
R = apply(HairEyeMale,1,sum)
C = apply(HairEyeMale,2,sum)
Emale = outer(R,C)/nMale
cat("Expected frequencies under independence for males\n")
print(round(Emale,1))
Dmale = ((HairEyeMale - Emale)^2)/Emale
cat(paste("Total distance from independence for males =",round(sum(Dmale),3)),"\n")
cat("Individual distances for males are given by:\n")
print(round(Dmale,3))
Now repeat just for females.
cat("Hair and eye color frequency table for females\n")
HairEyeFemale = HairEyeColor[,,"Female"]
print(HairEyeFemale)
nFemale = sum(HairEyeFemale)
R = apply(HairEyeFemale,1,sum)
C = apply(HairEyeFemale,2,sum)
Efemale = outer(R,C)/nFemale
cat("Expected frequencies under independence for females\n")
print(round(Efemale,1))
Dfemale = ((HairEyeFemale - Efemale)^2)/Efemale
cat(paste("Total distance from independence for females =",round(sum(Dfemale),3)),"\n")
cat("Individual distances for females are given by:\n")
print(round(Dfemale,3))
Note that the distances from independence of Blond hair are much higher for females than for males.


next up previous
Next: Continuous Random Variables Up: Additional Properties of Probability Previous: Additional Properties of Probability
ammann
2018-10-18