Casino Games Activity 3
8
Data and statistics What is a box and whisker plot? The following table reports the average monthly temperatures for San Francisco, California and for Raleigh, North Carolina. Dotplots of these twelve temperatures for each city appear below. Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Raleigh
39
42
50
59
67
74
78
77
71
60
51
43
S.F.
49
52
53
56
58
62
63
64
65
61
55
49
As you can figure out, the median temp for Raleigh is 59.5, while the median temp for S.F. is 57. These two numbers are pretty close to each other, but we can’t conclude that there is no difference between the two cities with regard to monthly temperature. You can see that Raleigh has more variability. Variability is measured with range or standard deviation, among other statistics. You might also use interquartile range (IQR) as a measure of variability. IQR divides the data into four (roughly) equal parts, then finds how far apart the 25% line is from the 75% line.
Let’s find the lower quartile for the Raleigh data. Here’s the complete data set in order: 39
42
43
50
51
59
60
67
71
74
77
78
There are 12 data values, so the median is the mean of the 6th and 7th, 59 and 60 = 59.5. To find the lower quartile, list all of the values below the median. Then find the median of that list. 39
42
43
50
51
59
There are 6 data values, so the median is the mean of the 3rd and 4th, 43 and 50. (43 + 50)/2 = 46.5. 46.5 is the lower quartile. Find the upper quartile in the same manner. 60
67
71
74
77
78
Upper quartile = (71 + 74)/2 = 72.5. Thus, the IQR is 72.5 (upper quartile) minus 46.5 (lower quartile) = 26.
The median, quartiles, and extremes (minimum and maximum) of a distribution are called the five-number summary, which gives a quick description of the data. Here’s the five-number summary (plus the mean for comparison) for Raleigh. These five numbers form the basis for a boxplot, sometimes called a box and whisker plot. To make a boxplot, draw a rectangle, or box, between the quartiles. Horizontal lines called whiskers are extended from the middle of the sides of the box to the extremes. Then the median is marked with a vertical line inside the box.
How do I know what plot to use?
Type of plot
For what kind of data is this appropriate?
Advantages
Drawbacks
bar graph
comparing categorical (word) variables
simple, works well for categorical (word) variables
doesn’t make sense for quantitative (number) variables
circle graph
comparing categorical (word) variables
visually simple, makes sense to most people
doesn’t make sense for quantitative (number) variables, can be manipulated to distort data
dotplot
a single quantitative variable
keeps all data values, provides visual distribution for quantitative data
cumbersome for large data sets
stem plot
a single quantitative variable
keeps all data values, provides visual distribution for quantitative data, simplifies dot plot structure without losing detail
cumbersome for large data sets
histogram
a single quantitative variable
works for large data sets
loses some detail in bunching data into subranges
scatterplot
appropriate for comparing two quantitative variables
can see trends and comparisons
sometimes difficult to recognize linear vs. nonlinear trends
boxplot
a single quantitative variable
simple, shows essential parts of a distribution
loses most of the detail in the data
8
8
8