Pampers develops a Rash Group 7
Introduction
Mean of all variables in high brand preference is greater than low brand preference. Therefore all 9 variables are responsible for discriminating between low preference group and high preference group
To reconfirm, we use anova and see that the p value for all variables is less than 0.05 , therefore all variables are significant for discriminating between the two groups.
The correlation matrix for all the predictor variables shows that correlation between count per box and price, style and unisex, leakage and absorbency is greater than 0.7.Therefore we have the problem of multicollinearity
Factor analysis To deal with the problem of multicollinearity, we conduct factor analysis
Conditions to apply factor analysis: • Size: 300 > 5*9 (variables) • Barttlet test of sphericity (significant) • KMO: Value is greater than 0.5
•
Factors obtained from factor analysis can explain 91% and 91.4% information in Unisex and Style variables respectively.
Factors with eigen value >1 are selected. Three factors combined together explain 81.396% of variance in the entire data set Percentage of variance explained by factor 1,2 and 3 after rotation are 32.764%, 27.705% and 20.928% respectively
• In rotated component matrix, 0.75 is chosen as cut-off value for factor loading • Factor 1 comprises of X6, X7, X8 and X9 which can be named as BASIC FUNCTIONALITY • Factor 2 comprises of X1, X2 and X3 and can be named as VALUE FOR MONEY • Factor 3 comprises of X4 and X5 and can be named as ADD-ON FEATURES • These 3 factors will be independent of each other and correlation will be 0
Discriminant Analysis Now we use these 3 factors namely Basic functionality, value for money and add on features as independent variables and Brand preference as the grouping variable for discriminant analysis
Mean for all variables in low preference is less than the mean of high preference
The standard deviation of value for money and add on features is greater in the low brad preference than high brand preference
As the significant value is less than 0.05 ,therefore all 3 variables are significant for discriminating between the low brand preference and high brand preference
Correlation between any pair of the predictor variables do not exceed 0.75,we don’t have a problem of multicollinearity
Unstandardized Discriminant function
Y= 0.877X1 + 0.658X2 + 0.797X3 Y= discriminant score X1= basic functionality X2=Value for money X3= Add on features Canonical correlation- correlation coefficient b/w discriminant score and corresponding group membership. Therefore (0.676)^2 = 45% of variance between a perception of high/low preference is due to changes in the 3 predictor variables
Significance of discriminant model
Wilks lambda is ratio of within group sum of squares to total sum of squares. Lower the wilks’ lambda higher the significance. Using Chi square we see that p value is less than 0.05 .Therefore discriminant function is significant and can be used for further analysis
Standardized discriminant function Absolute value of Standardized coefficient reflects the relative contribution of each of the independent variables in discriminating between the groups Basic functionality > Add on features > value for money
Basic functionality> Add on features > Value for money
Assessing classification accuracy Mean discriminant score of low preference and high preference
To calculate the cutoff score , we use C= n2Y1+n1Y2/(n1+n2) , Y1 and Y2 = mean discriminant score of low preference and high preference n1 and n2 = sizes of group of low preference and high preference Cut off score = 0.246 Therefore if discriminant score is greater than 0.246 it is classified as high brand preference and if lower than 0.246 it is classified as low brand preference
HIT Ratio: No of correct predictions/total no of cases =236/300=79.3%