Analysis of Variance (ANOVA) One Way Classification Random samples of size n are selected from each of k populations. It will be assumed that the k populations are independent and normally distributed with means µ1 = µ 2 = µ3 = µ 4 = ......... = µ K and common variance σ 2 . We wish to derive appropriate methods for testing the hypothesis: H 0 : µ1 = µ 2 = µ3 = µ 4 = ......... = µ K H A : at least two of the means are not equal.
Table 1 K random samples Population
Total Mean
1 x11 x12
2 x 21 x 22
. . . .
. . . .
x1n T1 x1
x2 n T2 x2
……… i ……… xi1 ……… xi 2
……… k ……… x k 1 ……… xk 2
. . . .
. . . .
xin Ti xi
……… ……… ………
……… ……… ………
x kn Tk xk
One way Sum of Squares Identity k
n
k
k
n
∑ ∑ ( x − x ) = n∑ ( x − x ) + ∑ ∑ ( x − x ) i =1
j =1
2
ij
i =1
2
i
Total Sum of Square = SST =
i =1
k
j =1
n
ij
∑ ∑ (x − x) i =1
2
i
2
ij
j =1
k
2 Sum of Squares for column mean = SSC = n∑ ( xi − x ) i =1
k
n
∑j =1 ( xij − xi ) Error Sum of Square = SSE = ∑ i =1 According to one way sum of squares identity SST = SSC + SSE
2
T x
Steps Of Working: 1.
Set the null hypothesis H 0 :
e.g.
H 0 : µ1 = µ 2 = µ3 = µ 4 = ......... = µ K
2.
Set the alternative hypothesis
3.
Level of significance
HA : e.g H A : at least two of the means are not equal.
α
(also decide the case, either belong to the one
tailed or two tailed)
4.
Check the table a. FOR EQUAL SAMPLE SIZE (As it is the case of comparison of progress there for we use the table of F distribution.) fα [ k −1, k ( n −1)]
b.
FOR UNEQUAL SAMPLE SIZE
The degree freedom and its calculation is different when the sample size are not equal . fα (k −1, N − k )
5.
Computations: (FOR EQUAL SAMPLE SIZE) Sum of Squares Computational Formulae T2 SST = ∑ ∑ xij − i =1 j =1 nk k
n
k
2
∑T T SSC = − 2
i
i =1
n
2
nk
SSE = SST – SSC
Table 2-a ANOVA (ONE-WAY CLASSISFICATION) Source of Variation
Sum of Squares
Degrees of Freedom
Mean Square
Computed f
Column means
SSC
k-1
s12 =
Error
SSE
K(n-1)
s22 =
Total
SST
nk - 1
SSC k −1
SSE k (n −1)
s12 s22
Computations: (FOR UNEQUAL SAMPLE SIZE) Sum of Squares Computational Formulae
T SST = ∑ ∑ x − N ∑T T k
ni
i =1
j =1
k
SSC =
2
2
ij
2
i
i =1
n
−
i
2
Nk
SSE = SST – SSC Table 2-b ANOVA (ONE-WAY CLASSISFICATION) Source of Variation
Sum of Squares
Degrees of Freedom
Column means
SSC
k-1
Error
SSE
N-k
Total
SST
N-1
6. Decision: Reject the null hypothesis H 0 : When f hypothesis. Practice Questions: From Walpole: Example # 1, Page # 392 Exercise On Page # 400, Q # 2 and Q # 3 Example # 2, Page # 394
Mean Square
Computed f
SSC k −1 SSE s = N −k
s12 s22
s12 = 2
2
cal
> f tab
and accept alternative
Exercise On Page # 400, Q # 4 and Q # 5 Q7.
Four brands of flashlight batteries are to be compared by testing each brand on five flashlights. Twenty flashlights are randomly selected and divided randomly into four groups of five flashlights each. Then each group of flashlights uses a different brand of battery. The lifetimes of the batteries to the nearest hour are as follows. Brand A Brand B Brand C Brand D 42 28 24 20 30 36 36 32 39 31 28 38 28 32 28 28 29 27 33 25 At the 5% significance level, does there appear to be a difference in mean lifetime among the four brands of batteries? Q8.
A chain of convenience stores wanted to test three different advertising policies: Policy # 1: No advertising Policy # 2: Advertising in neighborhoods with circulars Policy # 3: Use circulars and advertise in newspapers. Eighteen stores were randomly selected and divided randomly into three groups of six stores. Each group of six stores. Each group used one of the three policies. Following the implementation of the policies, sales figures were obtained for each of the stores during a 1-month period. The figures are displayed, in thousands of dollars, in the following table.
Policy # 1 Policy # 2 Policy # 3 22 21 29 20 25 24 26 25 31 21 20 32 24 22 26 22 26 27 Do the data provide evidence of a difference in mean monthly sales, among the three policies? Perform the required hypothesis test at the 1% significance level. Q9. The Bureau of Labor Statistics publishes data on weekly earnings of nonsupervisory workers in Employment and Earnings. The following data in dollars, were obtained from random samples of (full and part-time) workers in five service-producing industries. Transport Wholesale trade Retail trade Finance, Services Insurance, Real Estate 467 402 208 424 364 507 347 136 378 376 468 327 118 460 383 512 396 246 346 299 559 380 133 336
490 227 273 Do the data provide sufficient evidence to conclude that a difference exists in mean weekly earnings among non-supervisory workers in the five industries? Perform the required hypothesis test using the level of significance is 0.05.
Analysis of Variance (ANOVA)