Prepared by: Laserna, Mikee Rose T. Date Submission: Dec 14, 2018 RELIABILITY AND VALIDITY OF INTERVIEWS The reliability of an interview is typically evaluated in terms of the level of agreement between at least two raters who evaluated the same patient or client. interrater reliability or interjudge reliability The level of agreement between at least two raters who have evaluated the same patient independently. Agreement can refer to consensus on symptoms assigned, diagnoses assigned, and so on. kappa coefficient A statistical index of interrater reliability computed to determine how reliably raters judge the presence or absence of a feature or diagnosis. The validity of an interview concerns how well the interview measures what it intends to measure. Reliability Standardized (structured) interviews with clear scoring instructions will be more reliable than unstructured interviews. The reason is that structured interviews reduce both information variance and criterion variance. Information variance refers to the variation in the questions that clinicians ask, the observations that are made during the interview, and the method of integrating the information that is obtained. Criterion variance refers to the variation in scoring thresholds among clinicians.
Test–retest reliability Index of the consistency of interview scores across some period of time. Diagnostic Agreement Between Two Raters Rater 2 Present
Absent
Present
30
5
Rater 1
a
b
Absent
5
60 c
d N =100
Overall Agreement = a+ d /N =.90 Kappa= (a + d/ N)-(( a+ b)( a+ c)+( c+ d)( b+ d)) N2 1-((a+ b)( a+ c)+( c+ d)( b+ d)) N2 ad - bc ad – bc + N (b + c)/ 2 1775 2275 = .78
This presents a hypothetical data set from a study assessing the reliability of alcoholism diagnoses derived from a structured interview. This example assesses interrater reliability (the level of agreement between two raters), but the calculations would be the same if one wanted to assess test–retest reliability. In that case, the data for Rater 2 would be replaced by data for Testing 2 (Retest). As can be seen, the two raters evaluated the same 100 patients for the presence/absence of an alcoholism diagnosis, using a structured interview. These two raters agreed in 90% of the
cases [(30 60)/100]. Agreement here refers to coming to the same conclusion—not just agreeing that the diagnosis is present but also that the diagnosis is absent. The table also presents the calculation for kappa—a chance-corrected index of agreement that is typically lower than overall agreement. The reason for this lower value is that raters will agree on the basis of chance alone in situations where the prevalence rate for a diagnosis is relatively high or relatively low. In the example shown in Table 6-5, we see that the diagnosis of alcoholism is relatively infrequent. Therefore, a rater who always judged the disorder to be absent would be correct (and likely to agree with another rater) in many cases. The kappa coefficient takes into account such instances of agreement based on chance alone and adjusts the agreement index (downward) accordingly. In general, a kappa value between .75 and 1.00 is considered to reflect excellent interrater agreement beyond chance. Validity The validity of any type of psychological measure can take many forms. Content validity refers to the measure’s comprehensiveness in assessing the variable of interest. For example, if an interview is designed to measure depression, then we would expect it to contain multiple questions assessing various emotional, cognitive, and physiological aspects of depression. Criterion-related validity refers to the ability of a measure to predict (correlate with) scores on other relevant measures. For example, an interview assessing conduct disorder in childhood may be said to have criterion-related validity to
the extent that its scores correlate with measures of peer rejection and aggressive behavior. Discriminant validity refers to the interview’s ability not to correlate with measures that are not theoretically related to the construct being measured. For example, there is no theoretical reason a specific phobia (e.g., of heights) should be correlated with level of intelligence. Therefore, a demonstration that the two measures are not significantly correlated would indicate the specific phobia interview’s discriminant validity. Construct validity is used to refer to all of these aspects of validity. The extent to which interview scores are correlated with other measures or behaviors in a logical and theoretically consistent way. This will involve a demonstration of both convergent and discriminant validity. predictive validity a form of criterionrelated validity. The extent to which interview scores correlate with scores on other relevant measures administered at some point in the future.
Suggestions for Improving Reliability and Validity 1.Whenever possible, use a structured interview. A wide variety of structured interviews exist for conducting intakeadmission, case-history, mental status examination, crisis, and diagnostic interviews. 2. If a structured interview does not exist for your purpose, consider developing one. Generate a standard set of questions to be used, develop a set of guidelines to score respondents’
answers, administer this interview to a representative sample of subjects, and use the feedback from subjects and interviewers to modify the interview. If nothing else, completing this process will help you better understand what it is that you are attempting to assess and will help you become a better interviewer. 3. Whether you are using a structured interview or not, certain interviewing skills are essential: establishing rapport, being an effective communicator, being a good listener, knowing when and how to ask additional questions, and being a good observer of nonverbal behavior. 4. Be aware of the patient’s motives and expectancies with regard to the interview. For example, how strong are his or her needs for approval or social desirability? 5. Be aware of your own expectations, biases, and cultural values. Periodically, have someone else assess the reliability of the interviews you administer and score. THE ART AND SCIENCE OF INTERVIEWING Becoming a skilled interviewer requires practice. Without the opportunity to conduct real interviews, to make mistakes, or to discuss techniques and strategies with more experienced interviewers, a simple awareness of scientific investigations of interviewing will not confer great skill. A major one is to make clinicians more humble regarding their “intuitive skills.” Research suggests, for example, that prior expectancies can color the interviewer’s observations, that implicit theories of personality and psychopathology can influence the focus of an interview, and that the match or
mismatch of interviewer and interviewee in terms of race, age, and gender may influence the course and outcome of the interview. Thus, a number of influences on the interview process have been identified. If we never assess the validity of our diagnoses, if we never check our reliability against someone else, or if we never measure the efficacy of a specific interview technique, then we can easily develop an ill-placed confidence that will ultimately be hard on our patients. It may be true, as some cynics argue, that 10 studies, all purporting to show that “mmhmm” is no more effective than a nod of the head in expressing interviewer interest, still fail to disprove that in one specific or unique clinical interaction there may indeed be a difference. Although no single interview study will offer an unambiguous solution to an interview problem, these studies have a cumulative effect. Research can offer suggestions about improving the validity of our observations and techniques, shatter some timeworn illusions, and splinter a few clichés. By the sheer cumulative weight of its controlled, scientific approach, research can make interviewers more sensitive and effective. A clinician steeped in both the art and the science of interviewing will be more effective (though hardly more comfortable) than one who is conscious of only one of these dual aspects of interviewing.