Describing distributions with Numbers | Part II | SHUBLEKA
Measuring the spread: standard deviation and variance Variance is the average of the squares of the deviations of observations from their mean. The variance of n observations
x1 , x 2 , ..., x n
The standard deviation is s = • • • •
2 is s =
∑( xi − x ) n −1
( x1 − x )2 +( x2 − x )2 +...+( xn − x )2 n −1
=
∑( xi − x ) n −1
2
2
Some deviations are positive, some are negative The sum of the deviations from the mean will always be zero Squaring the deviations makes them all positive, so that extreme observations have large positive squares. The variance is the average squared deviation Two reasons for squaring: ¾ The sum of unsquared deviations is zero ¾ The standard deviation turns out to be the natural measure of spread for the normal distribution
n − 1 = degrees of freedom. (The last deviation can be determined once we know the other n-1) ¾ Only n-1 deviations can vary freely Properties of the Standard Deviation • •
•
s measures spread about the mean and should be used only when the mean is chosen as a measure of the center s = 0 only when there is no spread/variability; this happens when all the observations have the same value. Otherwise, s > 0. As the observations become more spread out about their mean, s gets larger. s, just like the mean, is not resistant. A few outliers can make standard deviation very large.
Choosing a summary ¾ five-number summary is usually better than the mean-standard deviation for skewed distributions or distribution with strong outliers ¾ x and s for reasonably symmetric distributions that are free of outliers 9 A graph gives the best overall picture of a distribution. Numerical measures of center and spread report specific facts about a distribution, but they do not describe its entire shape Minitab Demonstration: IQ Scores (Table 1_009) Linear Transformations: change of units xnew = a + bx ¾ Linear transformations do not change the shape of a distribution ¾ The mean, median, interquartile range, and standard deviation are multiplied by b ¾ Add a to both measures of center but the measures of spread remain unchanged