Class 5 Factor Analysis

  • July 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Class 5 Factor Analysis as PDF for free.

More details

  • Words: 2,673
  • Pages: 6
© Rohit Vishal Kumar

FACTOR ANALYSIS INTRODUCTION Factor analysis (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated. For example, an individual's response to the questions on a college entrance test is influenced by underlying variables such as intelligence, years in school, age, emotional state on the day of the test, amount of practice taking tests, and so on. The answers to the questions are the observed variables. The underlying, influential variables are the factors. Factor analysis is carried out on the correlation matrix of the observed variables. A factor is a weighted average of the original variables. The factor analyst hopes to find a few factors from which the original correlation matrix may be generated. Usually the goal of factor analysis is to aid data interpretation. The factor analyst hopes to identify each factor as representing a specific theoretical factor. Therefore, many of the reports from factor analysis are designed to aid in the interpretation of the factors. Another goal of factor analysis is to reduce the number of variables. The analyst hopes to reduce the interpretation of a 200-question test to the study of 4 or 5 factors. One of the most subtle tasks in factor analysis is determining the appropriate number of factors. Factor analysis has an infinite number of solutions. If a solution contains two factors, these may be rotated to form a new solution that does just as good a job at reproducing the correlation matrix. Hence, one of the biggest complaints of factor analysis is that the solution is not unique. Two researchers can find two different sets of factors that are interpreted quite differently yet fit the original data equally well. The program provides the principal axis method of factor analysis. The results may be rotated using varimax or quartimax rotation. The factor scores may be stored for further analysis. HOW MANY FACTORS? Several methods have been proposed for determining the number of factors that should be kept for further analysis. Several of these methods will now be discussed. However, remember that important

WBUT 2009 information about possible outliers and linear dependencies may be determined from the factors associated with the relatively small eigenvalues, so these should be investigated as well. Kaiser (1960) proposed dropping factors whose eigenvalues are less than one since these provide less information than is provided by a single variable. Jolliffe (1972) feels that Kaiser's criterion is too large. He suggests using a cutoff on the eigenvalues of 0.7 when correlation matrices are analyzed. Other authors note that if the largest eigenvalue is close to one, then holding to a cutoff of one may cause useful factors to be dropped. However, if the largest factors are several times larger than one, then those near one may be dropped. Cattell (1966) documented the scree graph, which will be described later in this chapter. Studying this chart is probably the most popular method for determining the number of factors, but it is subjective, causing different people to analyze the same data with different results. Another criterion is to preset a certain percentage of the variation that must be accounted for and then keep enough factors so that this variation is achieved. Usually, however, this cutoff percentage is used as a lower limit. That is, if the designated number of factors do not account for at least 50% of the variance, then the whole analysis is aborted. ROTATION Factor analysis finds a set of dimensions (or co-ordinates) in a subspace of the space defined by the set of variables. These co-ordinates are represented as axes. They are orthogonal (perpendicular) to one another. For example, suppose you analyse three variables that are represented in three-dimensional space. Each variable becomes one axis. Now suppose that the data lie near a two-dimensional plane within the three dimensions. A factor analysis of this data should uncover two factors that would account for the two dimensions. You may rotate the axes of this two-dimensional plane while keeping the 90-degree angle between them, just as the blades of a helicopter propeller rotate yet maintain the same angles among themselves. The hope is that rotating the axes will improve your ability to interpret the "meaning" of each factor. Many different types of rotation have been suggested. Most of them were developed for use in factor analysis.

-1The program provides two orthogonal rotation options: varimax and quartimax. Varimax Rotation: Varimax rotation is the most popular orthogonal rotation technique. In this technique, the axes are rotated to maximise the sum of the variances of the squared loadings within each column of the loadings matrix. Maximising according to this criterion forces the loadings to be either large or small. The hope is that by rotating the factors, you will obtain new factors that are each highly correlated with only a few of the original variables. This simplifies the interpretation of the factor to a consideration of these two or three variables. Another way of stating the goal of varimax rotation is that it clusters the variables into groups; each "group" is actually a new factor. Since varimax seeks to maximise a specific criterion, it produces a unique solution (except for differences in sign). This has added to its popularity. SCREE PLOT This is a rough bar plot of the eigenvalues. It enables you to quickly note the relative size of each eigenvalue. Many authors recommend it as a method of determining how many factors to retain. The word scree, first used by Cattell (1966), is usually defined as the rubble at the bottom of a cliff. When using the scree plot, you must determine which eigenvalues form the “cliff” and which form the “rubble.” Cattell & Jaspers (1967) suggest keeping those that make up the cliff plus the first factor of the rubble. VALIDATING OUTPUT Phi: This is the Gleason-Staelin redundancy measure of how interrelated the variables are. A zero value of means that there is no correlation among the variables, while a value of one indicates perfect correlation among the variables. It is good to perform factor analysis if the value of Phi is between 0.50 and 1.00 Bartlett Test, df, Prob: This is Bartlett’s sphericity test (Bartlett, 1950) for testing the null hypothesis that the correlation matrix is an identity matrix (all correlations are zero). If you get a probability value greater than 0.05, you should not perform a factor analysis on the data. The test is valid for large samples (N>150). It uses a Chi-square distribution with p(p-1)/2 degrees of freedom. Note that this test is only available when you analyse a correlation matrix.

© Rohit Vishal Kumar

WBUT 2009

-2-

FACTOR ANALYSIS – EXAMPLE AND OUTPUT The Raw Data as taken from the example given in the Factor Analysis chapter of Naresh K. Malhotra’s Marketing Research – An applied orientation. V1 V2 V3 V4 V5 V6 7 3 6 4 2 4 1 3 2 4 5 4 6 2 7 4 1 3 4 5 4 6 2 5 1 2 2 3 6 2 6 3 6 4 2 4 5 3 6 3 4 3 6 4 7 4 1 4 3 4 2 3 6 3 2 6 2 6 7 6 6 4 7 3 2 3 2 3 1 4 5 4 7 2 6 4 1 3 4 6 4 5 3 6 1 3 2 2 6 4 6 4 6 3 3 4 5 3 6 3 3 4 7 3 7 4 1 4 2 4 3 3 6 3 3 5 3 6 4 6 1 3 2 3 5 3 5 4 5 4 2 4 2 2 1 5 4 4 4 6 4 6 4 7 6 5 4 2 1 4 3 5 4 6 4 7 4 4 7 2 2 5 3 7 2 6 4 3 4 6 3 7 2 7 2 3 2 4 7 2

Where the explanation of the data is as follows:

V1 = Prevent cavities V2 = Gives Shiny Teeth V3 = Provide strong gums V4 = Provides fresh breath V5 = Prevents Tooth decay V6 = Gives attractive teeth 30 respondents were asked to rate the above six attributes on a scale of 1 - 7 in response to a question as to how important the attributes are while purchasing the toothpaste The scale used was: 1 = Strongly Disagree 2 = Disagree 3 = Somewhat Disagree 4 = Neither agree nor disagree 5 = Somewhat Agree 6 = Agree 7 = Strongly Agree So according to the first respondent the most important attribute that can Influence purchase a toothpaste was its ability to prevent cavities (7), then the ability to provide strong Gums(6) , followed by their ability to provide fresh breath (4) and give attractive teeth (4). Followed by the toothpaste’s ability to give healthy teeth (3) and least of all the ability to prevent tooth decay (5) The objective of the factor analysis exercise to find out which attributes convey similar meaning and can be clubbed together.

The data entry window for both NCSS and SPSS looks similar and is reproduced below from NCSS

© Rohit Vishal Kumar

WBUT 2009

-3-

From the Analysis Menu in NCSS we choose “Analysis” -> “Multivariate Analysis” -> “Factor Analysis. On doing so the factor analysis options selection box opens up – which is shown below:

The choices are as follows: Variables to be included in Factor analysis Data Input Format Factor Rotation Missing Value Estimation 1 Numbers of factors to extract 2 Maximum Iteration 1 2

– – – – – –

V1 to V6 Regular Data Varimax Rotation None 2 6

SPSS has an option of limiting the number of factors but by default it generates the full factor analysis and then the onus of choosing the factors lies with the investigator SPSS determines the number of iterations independently of the researcher intervention. NCSS has a default value of 6 which can be increased or decreased as required. Most factor rotations should converge in 5 but the best limit is 30.

THE OUTPUT FROM NCSS:

Page/Date/Time Database

Factor Analysis Report 1 27/02/2002 21:28:28 C:\Rohit - Important\Rohit\IIS_WBM\Cases\factor\factor.S0

1. Descriptive Statistics Section Variables V1 V2 V3 V4 V5 V6

Count 30 30 30 30 30 30

Table 1 gives us the descriptive statistics.

Mean 3.933333 3.9 4.1 4.1 3.5 4.166667

Standard Deviation 1.981524 1.373392 2.056948 1.373392 1.907336 1.391683

Communality 0.927653 0.561986 0.836849 0.601356 0.789274 0.720238

© Rohit Vishal Kumar

WBUT 2009

-4-

2. Correlation Section Variables Variables V1 V2 V3 V4 V5 V1 1.000000 -0.053218 0.873090 -0.086162 -0.857637 V2 -0.053218 1.000000 -0.155020 0.572212 0.019746 V3 0.873090 -0.155020 1.000000 -0.247788 -0.777848 V4 -0.086162 0.572212 -0.247788 1.000000 -0.006582 V5 -0.857637 0.019746 -0.777848 -0.006582 1.000000 V6 0.004168 0.640465 -0.018069 0.640465 -0.136403 Phi=0.473692 Log(Det|R|)=-4.254032 Bartlett Test=111.31 DF=15 Prob=0.000000

V6 0.004168 0.640465 -0.018069 0.640465 -0.136403 1.000000

This is the correlation matrix generated by NCSS. Notice that the main diagonal contains 1.00. This matrix forms the basis of input for Factor Analysis both in the centroid method and the Principal Component Method 3. Eigenvectors after Varimax Rotation Factors Variables Factor1 V1 0.591531 V2 -0.128823 V3 0.570093 V4 -0.153863 V5 -0.529954 V6 -0.062964

Factor2 -0.123071 -0.527405 -0.028183 -0.538050 0.189998 -0.616689

in ncss we had to specify the number of factors to extract. Initially a large number is chosen and then the numbers of factors are decreased in the selection box until and unless we get the desired number of factors and/or ncss stop’s complaining that it cannot solve the factor analysis. Table 3 gives us the eigen values of the factors after applying the varimax rotation. Ncss does not generate the initial factor solution. Spss on the other hand first generates the initial solution and then applies the rotation to give us the rotated factor matrix 4. Factor Loadings after Varimax Rotation Factors Variables Factor1 V1 0.962670 V2 -0.054005 V3 0.902385 V4 -0.090303 V5 -0.884852 V6 0.074402

Factor2 0.030345 -0.747709 0.150168 -0.770196 0.079443 -0.845401

Table 4 gives us the factor loadings for the factors and table 5 below gives us the communalities after rotation 5. Communalities after Varimax Rotation Factors Variables Factor1 V1 0.926733 V2 0.002917 V3 0.814299 V4 0.008155 V5 0.782963 V6 0.005536

Factor2 0.000921 0.559069 0.022550 0.593202 0.006311 0.714703

Communality 0.927653 0.561986 0.836849 0.601356 0.789274 0.720238

6. Factor Structure Summary after Varimax Rotation Factor1 V1 V3 V5

Factor2 V6 V4 V2

Table 6 – as generated by ncss – is an improvement over spss. In table 6 ncss tries to show which attribute are covered in factor 1 and which other in factor 2. Spss and do not make any attempt to club attributes under various factors – this clubing is left to the researcher. However NCSS output should not be taken as final – the researcher should apply his mind and see whether further improvement can be done. 6. Plots Section

© Rohit Vishal Kumar

WBUT 2009

-5-

Factor Scores 2.00 18 8 1 6 2527 16

Score1

1.13 22

0.25

13 3 11

17 7

4

29

14 24 26 20

-0.63

28 23 12 2

10

-1.50 -3.00

-1.75

-0.50

919 21 15 30

0.75

5

2.00

Score2 THE OUTPUT FROM SPSS: 1. Analysis number 1 List-wise deletion of cases with missing values Mean 3.93333 3.90000 4.10000 4.10000 3.50000 4.16667

V1 V2 V3 V4 V5 V6

Number of Cases =

Std Dev 1.98152 1.37339 2.05695 1.37339 1.90734 1.39168

Label Prevents Cavities Gives Shiny Teeth Strengthens Gums Freshens Breath Prevents Tooth Decay Attractive Teeth

30

Extraction 1 for analysis 1, Principal Components Analysis (PC) Table 1 is the descriptive statistics table as generated from the data. 2. Initial Statistics: Variable V1 V2 V3 V4 V5 V6 PC

Communality 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000

* * * * * * *

Factor 1 2 3 4 5 6

Eigenvalue 2.73119 2.21812 0.44160 0.34126 0.18263 0.08521

Pct of Var Cum Pct 45.5 45.5 37.0 82.5 07.4 89.8 05.7 95.5 03.0 98.6 01.4 100.0

extracted 2 factors.

Table 2 shows the start of the analysis. Initially SPSS assumes that all the attributes under study are factor’s and as such each of the factors 1 – 6 are assigned the equal weight 1. The initial eigen values are calculated and from the eigen value scores the percentage of variation explained and the cumulative percentage determined.

© Rohit Vishal Kumar

WBUT 2009

-6-

3. Factor Matrix: Factor 1 Factor 2 V1 0.92834 0.25323 V2 -0.30053 0.79525 V3 0.93618 0.13089 V4 -0.34158 0.78897 V5 -0.86876 -0.35079 V6 -0.17664 0.87116 Table 3 final factor matrix without rotation. SPSS extracted two factors factor 1 and factor 2. The factor scores are provided in the above matrix. If any rotation procedure is not selected then table 4 below is the final output. If any rotation procedure is selected then the final rotated matrix table 5 is also generated. Note that SPSS does not provide any sort of clubbing as to which attribute belongs to which factor. It is the work of the researcher to interpret the factor scores and do the clubbing 4. Final Statistics: Variable V1 V2 V3 V4 V5 V6

Communality 0.92594 0.72274 0.89357 0.73915 0.87779 0.79012

* * * * * * *

Factor 1 2

Eigenvalue 2.73119 2.21812

Pct of Var 45.5 37.0

Cum Pct 45.5 82.5

VARIMAX rotation 1 for extraction 1 in analysis 1 - Kaiser Normalization. VARIMAX converged in 3 iterations. Table 4 presents the final summary of the factor analysis after applying the rotation. As per the table factor 1 and factor 2 are the two factors that have been extracted and they explain 82.5% of the variation present in the data 5. Rotated Factor Matrix:

V1 V2 V3 V4 V5 V6

Factor 1 Factor 2 0.96189 -0.02663 -0.05721 0.84821 0.93394 -0.14599 -0.09832 0.85410 -0.93313 -0.08401 0.08337 0.88497

Table 5 is the final factor matrix with rotation. Note that the factor scores are different from those of the un-rotated factor matrix.

Related Documents

Factor Analysis
November 2019 20
Factor Analysis
May 2020 13
Factor Analysis
June 2020 18
Factor Analysis
November 2019 34
Factor Analysis
June 2020 9