Box Plots

  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Box Plots as PDF for free.

More details

  • Words: 588
  • Pages: 3
Casino Games Activity 3

8

Data and statistics What is a box and whisker plot? The following table reports the average monthly temperatures for San Francisco, California and for Raleigh, North Carolina. Dotplots of these twelve temperatures for each city appear below. Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Raleigh

39

42

50

59

67

74

78

77

71

60

51

43

S.F.

49

52

53

56

58

62

63

64

65

61

55

49

As you can figure out, the median temp for Raleigh is 59.5, while the median temp for S.F. is 57. These two numbers are pretty close to each other, but we can’t conclude that there is no difference between the two cities with regard to monthly temperature. You can see that Raleigh has more variability. Variability is measured with range or standard deviation, among other statistics. You might also use interquartile range (IQR) as a measure of variability. IQR divides the data into four (roughly) equal parts, then finds how far apart the 25% line is from the 75% line.

Let’s find the lower quartile for the Raleigh data. Here’s the complete data set in order: 39

42

43

50

51

59

60

67

71

74

77

78

There are 12 data values, so the median is the mean of the 6th and 7th, 59 and 60 = 59.5. To find the lower quartile, list all of the values below the median. Then find the median of that list. 39

42

43

50

51

59

There are 6 data values, so the median is the mean of the 3rd and 4th, 43 and 50. (43 + 50)/2 = 46.5. 46.5 is the lower quartile. Find the upper quartile in the same manner. 60

67

71

74

77

78

Upper quartile = (71 + 74)/2 = 72.5. Thus, the IQR is 72.5 (upper quartile) minus 46.5 (lower quartile) = 26.

The median, quartiles, and extremes (minimum and maximum) of a distribution are called the five-number summary, which gives a quick description of the data. Here’s the five-number summary (plus the mean for comparison) for Raleigh. These five numbers form the basis for a boxplot, sometimes called a box and whisker plot. To make a boxplot, draw a rectangle, or box, between the quartiles. Horizontal lines called whiskers are extended from the middle of the sides of the box to the extremes. Then the median is marked with a vertical line inside the box.

How do I know what plot to use?

Type of plot

For what kind of data is this appropriate?

Advantages

Drawbacks

bar graph

comparing categorical (word) variables

simple, works well for categorical (word) variables

doesn’t make sense for quantitative (number) variables

circle graph

comparing categorical (word) variables

visually simple, makes sense to most people

doesn’t make sense for quantitative (number) variables, can be manipulated to distort data

dotplot

a single quantitative variable

keeps all data values, provides visual distribution for quantitative data

cumbersome for large data sets

stem plot

a single quantitative variable

keeps all data values, provides visual distribution for quantitative data, simplifies dot plot structure without losing detail

cumbersome for large data sets

histogram

a single quantitative variable

works for large data sets

loses some detail in bunching data into subranges

scatterplot

appropriate for comparing two quantitative variables

can see trends and comparisons

sometimes difficult to recognize linear vs. nonlinear trends

boxplot

a single quantitative variable

simple, shows essential parts of a distribution

loses most of the detail in the data

8

8

8

Related Documents

Box Plots
April 2020 4
Scatter Plots
June 2020 9
Make Plots
June 2020 10
Macro Plots
July 2020 4
Funnel Plots
November 2019 7
Scatter Plots
June 2020 18