ANOVA Analysis of Variance
Prof. M.K.Tiwari Department of Industrial Engineering and Management IIT Kharagpur
Analysis of variance (ANOVA)
ANOVA assesses the extent to which the distributions of two or more variables overlap The more the distributions overlap the less likely it is that they are different
Analysis of Variance
Analysis of variance, or ANOVA, or F tests, were designed to overcome these shortcomings of the t test. An ANOVA with ONE IV with only two levels is the same as a t test.
The Logic of ANOVA
Hypothesis testing in ANOVA is about whether the means of the samples differ more than you would expect if the null hypothesis were true. This question about means is answered by analyzing variances.
Among other reasons, you focus on variances because when you want to know how several means differ, you are asking about the variances among those means.
Two Sources of Variability
In ANOVA, an estimate of variability between groups is compared with variability within groups. Between-group variation is the variation among the means of the different treatment conditions due to chance (random sampling error) and treatment effects, if any exist. Within-group variation is the variation due to chance (random sampling error) among individuals given the same treatment. A N O VA T o ta l V a r ia tio n A m o n g S c o r e s W ith in -G r o u p s V a r ia tio n V a ria t io n d u e t o c h a n c e .
B e tw e e n -G r o u p s V a r ia tio n V a ria t io n d u e t o c h a n c e a n d t r e a t m e n t e f f e c t ( i f a n y e x is t i s ) .
Variability Between Groups
There is a lot of variability from one mean to the next. Large differences between means probably are not due to chance. It is difficult to imagine that all six groups are random samples taken from the same population. The null hypothesis is rejected, indicating a treatment effect in at least one of the groups.
Variability Within Groups
Same amount of variability between group means. However, there is more variability within each group. The larger the variability within each group, the less confident we can be that we are dealing with samples drawn from different populations.
Completely Randomized Experiment and Analysis of Variance Say, we have ‘a’ different levels of single factor to be compared (Table 1), where, yij - represents the jth observation taken under treatment i. Table 1: Typical data for a single factor experiment Treatment (level)
Observations
Totals
Averages
1
y11
y12
.
.
.
y1a
y1.
y1.
2
y21
y22
.
.
.
.
y2.
y1.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
ya1
ya2
.
.
.
yan
ya.
. . . y a.
y..
y..
Completely Randomized Experiment and Analysis of Variance
• The levels of the factor are sometimes called treatments.
• Each treatment has six observations or replicates. • The runs are run in random order.
The observations may be described by the linear statistical model
i = 1, 2,..., a yij = µ + τ i + ε ij j = 1, 2,..., n Where,
µ : Overall mean τ i : Parameter associated with the i th treatment (i th treatment effect) ε ij : Random error component
The model can be written as
i = 1,2,..., a yij = µi + ε ij j = 1,2,..., n Where, µi = µ + τ i : Mean of i th treatment
Completely Randomized Experiment and Analysis of Variance
Completely Randomized Experiment and Analysis of Variance
Completely Randomized Experiment and Analysis of Variance
Completely Randomized Experiment and Analysis of Variance
Completely Randomized Experiment and Analysis of Variance
Completely Randomized Experiment and Analysis of Variance
Completely Randomized Experiment and Analysis of Variance
Completely Randomized Experiment and Analysis of Variance Example 1 The development engineer is interested in determining if the cotton weight percentage in a synthetic fiber affects the tensile strength, and she has run a completely randomized experiment with fiber levels of cotton weight percentage and five replicates. The data is given as below Observed tensile Strength lb/in2 Cotton Weight (%) 15 20 25 30 35
1 7 12 14 19 7
2 7 17 18 25 10
3 15 12 18 22 11
4 11 18 19 19 15
5 9 18 19 23 11
Totals 49 77 88 108 54 376
Averages 9.8 15.4 17.6 21.6 10.8 15.04
Completely Randomized Experiment and Analysis of Variance
have to test the hypothesis H 0 : 1 2 3 4 5 against H1 : some means are different • We
The sum of squares are computed as follows
y 2.. SST yij = 636.96 N i 1 j 1 5
5
1 5 2 y 2.. SSTreatements y i. n i 1 N
=475.76
Completely Randomized Experiment and Analysis of Variance
SS E SST SSTreatments
=161.20
ANOVA Table for above data Source of variation
Sum of Degrees of Squares freedom
Mean Square
F0
P- Value
Cotton weight Percentage Error Total
475.76
4
118.94
14.76
<0.01
161.20 636.96
20 24
8.06