By Sharon McDonald and Helen M. Edwards
Who Should Test Whom? Examining the use and abuse of personality tests in software engineering.
T
he construction of software engineering teams, the interaction between members, and how individual personalities influence these, has been a concern from the 1960s to the present day [5]. Nevertheless, despite claims from leading figures in the field that it is fundamentally people that make the difference between software success and failure, a corpus of knowledge and good practice has failed to emerge. While there have been some attempts to investigate these issues through the application of psychometric tests, the issue of what personality analysis can or cannot offer software engineering is still open for debate [6, 9]. In this article we argue that the lack of progress in this field is due in part to the inappropriate use of psychological tests, Illustration by R OBERT N EUBECKER
COMMUNICATIONS OF THE ACM January 2007/Vol. 50, No. 1
67
By Sharon McDonald and Helen M. Edwards
Who Should Test Whom? Examining the use and abuse of personality tests in software engineering.
T
he construction of software engineering teams, the interaction between members, and how individual personalities influence these, has been a concern from the 1960s to the present day [5]. Nevertheless, despite claims from leading figures in the field that it is fundamentally people that make the difference between software success and failure, a corpus of knowledge and good practice has failed to emerge. While there have been some attempts to investigate these issues through the application of psychometric tests, the issue of what personality analysis can or cannot offer software engineering is still open for debate [6, 9]. In this article we argue that the lack of progress in this field is due in part to the inappropriate use of psychological tests, Illustration by R OBERT N EUBECKER
COMMUNICATIONS OF THE ACM January 2007/Vol. 50, No. 1
67
Table title: The MBTI functions and their focus.
frequently coupled with basic misunderstandings of personality theory by those who use them. To support this case we will present our analysis of papers that focus on the empirical use of personality tests in a software engineering context. Our analysis is supported by the expertise of the first author, who is both a chartered psychologist and a trained administrator qualified in the use of MBTI and 16PF psychometric tests. We conclude with a set of recommendations for test application and use for researchers, participants, and readers.
ANALYZING PERSONALITY IN SOFTWARE ENGINEERING RESEARCH We surveyed papers published in the software engineering field relevant to the topic of personality testing, using digital libraries. This process generated 40 papers published between 1984 and 2004. From this pool 13 distinct papers were identified that focused on the empirical use of personality tests in a software engineering context: this subset is used to illustrate our arguments (osiris.sunderland.ac.uk/~cs0hed/CACMdata provides access to the full data set). Our analysis of these papers concentrates on examining test selection to identify whether reliable and valid instruments have been used, whether the test chosen is appropriate for the purpose, and the extent to which the personality testing process used is explicitly reported and discussed. It is our contention that as a minimum a paper must account for these issues if there is to be any confidence in the resultant data analysis. The majority of the papers surveyed (25 out of 40) focused on the Myers-Briggs Type Indicator (MBTI); we will therefore confine our discussion to this tool. The MBTI classifies personality in terms of people’s preferred ways of operating in the world. It categorizes individuals into one of 16 personality types. These types are derived from people’s expressed preferences on four functions: (E)xtroversion vs. (I)ntroversion, (S)ensing vs. (I)ntuition, (T)hinking vs. (F)eeling, and (J)udging vs. (P)erceiving [2]. Each type has a number of positive features: there is no ideal type, they are all equally valued. The eight functions and their focus are summarized in the table here.
together, or exhibit tensions. Capretz [3] studied the MBTI types of 100 software engineers and found the largest type was ISTJ. While Capretz acknowledges there is no link between type and performance, and that other factors have a bearing on career choice, he goes on to state that these findings are important for employers looking for software engineering professionals. More recently, the MBTI was used to investigate the link between personality type and a code review task with a sample of 64 students [4]. In this study those with an NT (Intuition-Thinking) preference were seen to perform the task better than other types: the largest single type was ENTP. The authors expressed surprise as their results conflicted with those of Capretz. However, these findings do not tell us a lot about the ideal or even adequate software engineer, given the fact that type is not normally distributed in the population. As Kerth et al. [9] correctly point out: personality tests cannot identify good software engineers over bad, nor can their results predict “on the job” performance; there is some evidence of the importance of other factors, such as work experience [11]. Where researchers wish to identify the personality factors related to software engineering, or those factors that typify a group of exceptional software engineers, a more appropriate approach would be to use a traitbased instrument (such as the 16PF) where comparisons to a normative sample can be made. The main barrier to this approach would be choosing, or most probably creating, a representative normative sample for comparison. The relevance of the MBTI to identify the makeup of software development teams has been limited, in that observable behavior is not always related to the underlying type. People can, and do, choose to operate in the non-preferred mode as situations dictate. The MBTI is a tool for the development of selfawareness and, when results are shared, awareness of others. Knowledge of personality type within a team allows people to expect others to react differently from themselves and equip them to cope more constructively with those differences. As such, the MBTI can be used to improve teamwork with the hoped for byproduct of improved productivity and quality, as long as the test is used properly.
WHAT CAN PERSONALITY TESTING OFFER SOFTWARE ENGINEERING? DISPELLING THE MYTHS In general, the MBTI has been used within software engineering research in one of two ways: to discover the personality type(s) that most typify good software engineers, or to identify the makeup of software development teams that are likely to work well
WILL THE REAL MBTI PLEASE STAND UP? The value of any psychometric instrument is directly related to the techniques used during construction; not all psychometric tests are created equal. Test publishers describe the precise methods of test development, in particular, statistical data relating to test reliability and validity. They do this because to
68
January 2007/Vol. 50, No. 1 COMMUNICATIONS OF THE ACM
communication with the ignore such issues would render a test worthless: a E–I Focus: The way in which we focus our attention and draw energy authors it was established (E) Introversion (I) that rather than using the poor test will yield poor Extroversion Focus on the external world of people Focus on own inner worlds of thoughts, ideas, and experiences. full MBTI, they had in results. However, the and activity. fact constructed their own importance of this appears S–N Focus: The way in which we take in information Sensing (S) Intuition (N) test: no details of test conto be lost on many of those Observant of detail, focus on the real See the bigger picture, drawing relationships between concepts, struction and validity were who use such tests. Unfor- and tangible, the here and now. imagining new possibilities. provided. tunately, the casual reader You might ask: So of many of the articles dis- T–F Focus: The way in which we make decisions (T) Feeling (F) what? Well, the developcussed here would not see Thinking Base decisions on logic and objectivity. Make decisions through empathy, guided by personal values. ment of a robust personalthe significance of this ity measure is a point, or its likely bearing J–P Focus: The way in which we deal with the outer world (J) Perceiving (P) time-consuming, iterative on the validity of the Judging Enjoy making plans, methodical and Like to be flexible and open to change, feel constrained by plans. process that can take many research, because in the strive to accomplish tasks by selfimposed deadlines. years, not least because majority of cases details of personality measures prethe specific tests used are The MBTI functions and sent particular problems during construction. For a glossed over, and in some their focus. test to be of any value it must be both reliable and cases, misrepresented. Even valid. (1/07) Test reliability is the extent to which a test is conwhen researchers have used Edward table the real MBTI, for example [1, 11], details of the sistent within itself, and over time. That is, the degree to which a test will give the same score or personality administration process are missing. Karn and Cowling’s [8] study of the interactions of type for an individual on retesting. Test validity is the personality types during software development claims extent to which a test measures what it is intended to to have used the MBTI to identify the individual per- measure. Reputable tests such as the MBTI provide sonality types of two teams of student software devel- statistical data on these factors and details of the methopers. In fact, the MBTI was not used, a later technical ods and samples used to gather this data. To ignore report [7] reveals that a freely available test was used these factors when choosing a test will increase the pos(www.humanmetrics.com). While it is claimed that sibility of acquiring misleading data. In addition, respondents may attempt to distort there are “no significant statistical differences between this test and the MBTI” [7], the argument is not con- their profiles; for example, by responding to items in vincing. On inspection of www.humanmetrics.com, ways they believe will create a favorable impression. no data is provided on the methods of test construc- Care must be taken therefore during item develoption, no reliability or validity data, and no MBTI cor- ment to limit the insight a respondent may have, and relation data. Moreover, in our opinion, the content to ensure that one pole of the preference does not and style of the site itself is hardly indicative of a pro- appear more appealing or acceptable than the other. fessional organization: no surface contact details are Standardized tests such as the MBTI are developed in provided and there is no firm evidence of credentials. a way that will limit the effects of such response sets. The site offers an interesting range of other free tests However, this peace of mind comes at a price: tests including, “find your perfect partner”—perhaps the such as the MBTI can be relatively expensive to purbasis for a new slant on the concept of “pair program- chase. Freely available tests generally do not provide ming”? Although this site might offer some amuse- data on reliability and validity. Nor do they offer an ment, the potential effects on the subject group are not insight into the test construction process, nor comso lighthearted. A critical part of the administration ment on the possible effects of response sets and how process is gaining client acceptance and willingness to the test design limits these effects. Taken together, the answer honestly. The testing environment in this case two issues of the lack of detail provided on test concan in no way have guaranteed that the subjects will struction, and the absence of validity and reliability data, severely limit our ability to trust the results of have taken the process seriously. Miller and Yin [10] discuss the use of the MBTI in such tests. A test is worthless if we cannot be sure that the construction of software inspection teams. They it measures what it is supposed to measure, and its claim to use the MBTI within this study, but then results are consistent over time. comment that they use the “standard approach of online specialized questionnaires.” We were interested TEST ADMINISTRATION AND FEEDBACK to discover precisely which version of the MBTI had All tests, including the MBTI, have a degree of error been used within this study and through personal in their accuracy, and this error may be amplified by COMMUNICATIONS OF THE ACM January 2007/Vol. 50, No. 1
69
Table title: The MBTI functions and their focus.
frequently coupled with basic misunderstandings of personality theory by those who use them. To support this case we will present our analysis of papers that focus on the empirical use of personality tests in a software engineering context. Our analysis is supported by the expertise of the first author, who is both a chartered psychologist and a trained administrator qualified in the use of MBTI and 16PF psychometric tests. We conclude with a set of recommendations for test application and use for researchers, participants, and readers.
ANALYZING PERSONALITY IN SOFTWARE ENGINEERING RESEARCH We surveyed papers published in the software engineering field relevant to the topic of personality testing, using digital libraries. This process generated 40 papers published between 1984 and 2004. From this pool 13 distinct papers were identified that focused on the empirical use of personality tests in a software engineering context: this subset is used to illustrate our arguments (osiris.sunderland.ac.uk/~cs0hed/CACMdata provides access to the full data set). Our analysis of these papers concentrates on examining test selection to identify whether reliable and valid instruments have been used, whether the test chosen is appropriate for the purpose, and the extent to which the personality testing process used is explicitly reported and discussed. It is our contention that as a minimum a paper must account for these issues if there is to be any confidence in the resultant data analysis. The majority of the papers surveyed (25 out of 40) focused on the Myers-Briggs Type Indicator (MBTI); we will therefore confine our discussion to this tool. The MBTI classifies personality in terms of people’s preferred ways of operating in the world. It categorizes individuals into one of 16 personality types. These types are derived from people’s expressed preferences on four functions: (E)xtroversion vs. (I)ntroversion, (S)ensing vs. (I)ntuition, (T)hinking vs. (F)eeling, and (J)udging vs. (P)erceiving [2]. Each type has a number of positive features: there is no ideal type, they are all equally valued. The eight functions and their focus are summarized in the table here.
together, or exhibit tensions. Capretz [3] studied the MBTI types of 100 software engineers and found the largest type was ISTJ. While Capretz acknowledges there is no link between type and performance, and that other factors have a bearing on career choice, he goes on to state that these findings are important for employers looking for software engineering professionals. More recently, the MBTI was used to investigate the link between personality type and a code review task with a sample of 64 students [4]. In this study those with an NT (Intuition-Thinking) preference were seen to perform the task better than other types: the largest single type was ENTP. The authors expressed surprise as their results conflicted with those of Capretz. However, these findings do not tell us a lot about the ideal or even adequate software engineer, given the fact that type is not normally distributed in the population. As Kerth et al. [9] correctly point out: personality tests cannot identify good software engineers over bad, nor can their results predict “on the job” performance; there is some evidence of the importance of other factors, such as work experience [11]. Where researchers wish to identify the personality factors related to software engineering, or those factors that typify a group of exceptional software engineers, a more appropriate approach would be to use a traitbased instrument (such as the 16PF) where comparisons to a normative sample can be made. The main barrier to this approach would be choosing, or most probably creating, a representative normative sample for comparison. The relevance of the MBTI to identify the makeup of software development teams has been limited, in that observable behavior is not always related to the underlying type. People can, and do, choose to operate in the non-preferred mode as situations dictate. The MBTI is a tool for the development of selfawareness and, when results are shared, awareness of others. Knowledge of personality type within a team allows people to expect others to react differently from themselves and equip them to cope more constructively with those differences. As such, the MBTI can be used to improve teamwork with the hoped for byproduct of improved productivity and quality, as long as the test is used properly.
WHAT CAN PERSONALITY TESTING OFFER SOFTWARE ENGINEERING? DISPELLING THE MYTHS In general, the MBTI has been used within software engineering research in one of two ways: to discover the personality type(s) that most typify good software engineers, or to identify the makeup of software development teams that are likely to work well
WILL THE REAL MBTI PLEASE STAND UP? The value of any psychometric instrument is directly related to the techniques used during construction; not all psychometric tests are created equal. Test publishers describe the precise methods of test development, in particular, statistical data relating to test reliability and validity. They do this because to
68
January 2007/Vol. 50, No. 1 COMMUNICATIONS OF THE ACM
communication with the ignore such issues would render a test worthless: a E–I Focus: The way in which we focus our attention and draw energy authors it was established (E) Introversion (I) that rather than using the poor test will yield poor Extroversion Focus on the external world of people Focus on own inner worlds of thoughts, ideas, and experiences. full MBTI, they had in results. However, the and activity. fact constructed their own importance of this appears S–N Focus: The way in which we take in information Sensing (S) Intuition (N) test: no details of test conto be lost on many of those Observant of detail, focus on the real See the bigger picture, drawing relationships between concepts, struction and validity were who use such tests. Unfor- and tangible, the here and now. imagining new possibilities. provided. tunately, the casual reader You might ask: So of many of the articles dis- T–F Focus: The way in which we make decisions (T) Feeling (F) what? Well, the developcussed here would not see Thinking Base decisions on logic and objectivity. Make decisions through empathy, guided by personal values. ment of a robust personalthe significance of this ity measure is a point, or its likely bearing J–P Focus: The way in which we deal with the outer world (J) Perceiving (P) time-consuming, iterative on the validity of the Judging Enjoy making plans, methodical and Like to be flexible and open to change, feel constrained by plans. process that can take many research, because in the strive to accomplish tasks by selfimposed deadlines. years, not least because majority of cases details of personality measures prethe specific tests used are The MBTI functions and sent particular problems during construction. For a glossed over, and in some their focus. test to be of any value it must be both reliable and cases, misrepresented. Even valid. (1/07) Test reliability is the extent to which a test is conwhen researchers have used Edward table the real MBTI, for example [1, 11], details of the sistent within itself, and over time. That is, the degree to which a test will give the same score or personality administration process are missing. Karn and Cowling’s [8] study of the interactions of type for an individual on retesting. Test validity is the personality types during software development claims extent to which a test measures what it is intended to to have used the MBTI to identify the individual per- measure. Reputable tests such as the MBTI provide sonality types of two teams of student software devel- statistical data on these factors and details of the methopers. In fact, the MBTI was not used, a later technical ods and samples used to gather this data. To ignore report [7] reveals that a freely available test was used these factors when choosing a test will increase the pos(www.humanmetrics.com). While it is claimed that sibility of acquiring misleading data. In addition, respondents may attempt to distort there are “no significant statistical differences between this test and the MBTI” [7], the argument is not con- their profiles; for example, by responding to items in vincing. On inspection of www.humanmetrics.com, ways they believe will create a favorable impression. no data is provided on the methods of test construc- Care must be taken therefore during item develoption, no reliability or validity data, and no MBTI cor- ment to limit the insight a respondent may have, and relation data. Moreover, in our opinion, the content to ensure that one pole of the preference does not and style of the site itself is hardly indicative of a pro- appear more appealing or acceptable than the other. fessional organization: no surface contact details are Standardized tests such as the MBTI are developed in provided and there is no firm evidence of credentials. a way that will limit the effects of such response sets. The site offers an interesting range of other free tests However, this peace of mind comes at a price: tests including, “find your perfect partner”—perhaps the such as the MBTI can be relatively expensive to purbasis for a new slant on the concept of “pair program- chase. Freely available tests generally do not provide ming”? Although this site might offer some amuse- data on reliability and validity. Nor do they offer an ment, the potential effects on the subject group are not insight into the test construction process, nor comso lighthearted. A critical part of the administration ment on the possible effects of response sets and how process is gaining client acceptance and willingness to the test design limits these effects. Taken together, the answer honestly. The testing environment in this case two issues of the lack of detail provided on test concan in no way have guaranteed that the subjects will struction, and the absence of validity and reliability data, severely limit our ability to trust the results of have taken the process seriously. Miller and Yin [10] discuss the use of the MBTI in such tests. A test is worthless if we cannot be sure that the construction of software inspection teams. They it measures what it is supposed to measure, and its claim to use the MBTI within this study, but then results are consistent over time. comment that they use the “standard approach of online specialized questionnaires.” We were interested TEST ADMINISTRATION AND FEEDBACK to discover precisely which version of the MBTI had All tests, including the MBTI, have a degree of error been used within this study and through personal in their accuracy, and this error may be amplified by COMMUNICATIONS OF THE ACM January 2007/Vol. 50, No. 1
69
&IGURE