Data Presentation and Descriptive Statistics A manufacturer is investigating the operating life of laptop computer batteries. The following data are available. Life (min.)
Life (min.)
Life (min.)
Life (min.)
130 164 145 140 131 125
145 130 129 127 126 132
126 132 133 139 145 126
146 152 155 137 148 126
126
135
131
129
147 156 132
136 146 142
129 130 132
136 146 132
Using the first two digits as stem we may develop the following plot: Freq. 12 5 6 9 7 6 6 6 9 6 9 10 13 0 1 2 0 2 5 6 2 3 9 1 0 2 7 6 2 16 14 5 0 7 5 6 2 5 6 8 6 10 15 6 2 5 3 16 4 1 Stem-and-leaf plot The plot shows that most of the data is clustered around 130, with few data points crossing the 150 limit. One may conclude that the center of the data is somewhere in the 130s. Variation is harder to judge. Whether the variability is high or low can only be determined on a comparative basis at this stage. If another data set is available (may be for another brand), a back-to-back stem-and-leaf plot could be used to visually compare the variability in both sets. By ordering the leafs, we get the following plot:
12 13 14 15 16
Freq 5666667999 10 0 0 0 1 1 2 2 2 2 2 3 5 6 6 7 9 16 0255566678 10 256 3 4 1 Ordered Stem-and-leaf plot
From the plot above, we may determine many measures of dispersion and central tendency: Minimum = 125, Maximum = 164, Range = 164 – 125 = 39. Mode = 126, 132 (both are repeated 5 times- Bimodal data)
( x x ) (132 133) Median( ~ x ) [ 20] [ 21] 132.5. 2 2 Other measures require some calculations: 40
Average ( x )
x i 1
40
i
(130 164 ... 146 132) 136.85. 40
These results confirm our initial conclusion that the center is in the 130s. 40
Variance( S ) 2
s S 9.79. 2
(x x) i 1
i
39
2
(130 136 .85) 2 ...(132 136 .85) 2 95.87. 39
Now, let us assume that another data set of 40 points is available for another brand of batteries (Battery 2). Life (min.)
Life (min.)
134 143 150 143 148 151 151 152 142 122
Life (min.)
130 134 135 140 146 138 128 142 146 134
140 136 160 138 140 151 146 144 142 145
Life (min.) 151 144 141 141 146 139 147 134 136 147
The measures of center and dispersion for Battery 2 are: Minimum = 122, Maximum = 161, Range = 161 – 122 = 39. Mode = 134, 146, 151 (all repeated 4 times- Multi-modal data) ( x x ) (142 142) Median( ~ x ) [ 20] [ 21] 142. 2 2 40
Average ( x )
x i 1
40
i
142. Symmetric data (Average = Median).
40
Variance( S 2 )
(x x) i 1
i
39
2
55.2.
s S 2 7.43.
These results show numerically that Battery 2 has a higher average life with slightly less variation. An easy way to graphically compare the two sets is to develop a back-to-back stemand-leaf plot. Freq 2 11 20 6 1
Battery 2
Battery 1 82 12 5666667999 98866544440 13 0001122222356679 87766665443322211000 14 0255566678 211110 15 256 0 16 4 Back-to-Back Stem-and-Leaf Plot
Freq 10 16 10 3 1
The plot above shows that more data for Battery 2 are in the 140s compared to the 130s for Battery 1. Also, the spread (variability) of Battery 2 is less than that of Battery 1. Based on these results, we may conclude that Battery 2 is a better brand (higher average and lower variability). The validity of this conclusion, however, depends on how data are collected and the sufficiency
of n. These issues are typically discussed as part of Inferential Statistics and Design of Experiments.
A better graphical comparison tool is the box (box-and-whisker) plot. A plot for both data sets is shown below.
Box Plot The plot above supports our previous conclusion as the interquartile range of Battery 2 is shorter than that of Battery 1 (less variability), and is shifted to the right (higher center).