BHAVAN’S INSTITUTE OFMANAGEMENT MYSORE
SUBJECT
: HRM
ASSGIMENT ON
: VALIDITY AND RELIABILITY
SUBMITED TO
: PROF.ROHINI.G.SHETTY HR DEPARTMENT BHAVAN’S INSTITUTE OF MANAGEMENT MYSORE
SUBMITTED BY
: SRINIVAS.D IV {TRI-SEM}, P.G.PM BHAVAN’S INSTITUTE OF MANAGEMENT MYSORE
DATE: 14-10-2008 DAY : TUESDAY
VALIDITY Introduction Validity is arguably the most important criteria for the quality of a test. The term validity refers to whether or not the test measures what it claims to measure. On a test with high validity the items will be closely linked to the test’s intended focus. For many certification and licensure tests this means that the items will be highly related to a specific job or occupation. If a test has poor validity then it does not measure the job-related content and competencies it ought to. When this is the case, there is no justification for using the test results for their intended purpose. There are several ways to estimate the validity of a test including content validity, concurrent validity, and predictive validity. The face validity of a test is sometimes also mentioned.
TYPES OF VALIDITY Content Validity While there are several types of validity, the most important type
for
most
certification
and
licensure
programs
is
probably that of content validity. Content validity is a logical process where connections between the test items and the job-related
tasks
are
established.
If
a
thorough
test
development process was followed, a job analysis was properly conducted, an appropriate set of test specifications were developed, and item writing guidelines were carefully followed, then the content validity of the test is likely to be very
high.
Content
validity
is
typically
estimated
by
gathering a group of subject matter experts (SMEs) together to review the test items. Specifically, these SMEs are
given the list of content areas specified in the test blueprint, along with the test items intended to be based on each content area. The SMEs are then asked to indicate whether or not they agree that each item is appropriately matched to the content area indicated. Any items that the SMEs identify as being inadequately matched to the test blueprint, or flawed in any other way, are either revised or dropped from the test.
Concurrent Validity Another important method for investigating the validity of a test is concurrent validity. Concurrent validity is a statistical method using correlation, rather than a logical method. Examinees who are known to be either masters or nonmasters on the content measured by the test are identified, and the test is administered to them under realistic exam conditions.
Once
the
tests
have
been
scored,
the
relationship is estimated between the examinees’ known status
as
either
masters
or
non-masters
and
their
classification as masters or non-masters (i.e., pass or fail) based on the test. This type of validity provides evidence that the test is classifying examinees correctly. The stronger the correlation is, the greater the concurrent validity of the test is.
Predictive Validity Another statistical approach to validity is predictive validity. This approach is similar to concurrent validity, in that it measures
the
relationship
between
examinees'
performances on the test and their actual status as masters
or non-masters. However, with predictive validity, it is the relationship
of
test
scores
to
an
examinee's
future
performance as a master or non-master that is estimated. In other words, predictive validity considers the question, "How well does the test predict examinees' future status as masters or non-masters?" For this type of validity, the correlation that is computed is between the examinees' classifications as master or non-master based on the test and their later performance, perhaps on the job. This type of validity is especially useful for test purposes such as selection or admissions.
Face Validity One additional type of validity that you may hear mentioned is
face
validity.
Like
content
validity,
face
validity
is
determined by a review of the items and not through the use of statistical analyses. Unlike content validity, face validity is not investigated through formal procedures and is not determined by subject matter experts. Instead, anyone who looks
over
stakeholders,
the
test,
may
including
develop
an
examinees informal
and
opinion
other as
to
whether or not the test is measuring what it is supposed to measure. While it is clearly of some value to have the test appear to be valid, face validity alone is insufficient for establishing that the test is measuring what it claims to measure. A well developed exam program will include formal studies into other, more substantive types of validity.
Summary The validity of a test is critical because, without sufficient validity, test scores have no meaning. The evidence you collect and document about the validity of your test is also your best legal defense should the exam program ever be challenged in a court of law. While there are several ways to estimate validity, for many certification and licensure exam programs the most important type of validity to establish is content validity.
RELIABILITY Introduction Reliability is one of the most important elements of test quality. It has to do with the consistency, or reproducibility, of an examinee's performance on the test. For example, if you were to administer a test with high reliability to an examinee on two occasions, you would be very likely to reach
the
same
conclusions
about
the
examinee's
performance both times. A test with poor reliability, on the other hand, might result in very different scores for the examinee across the two test administrations. If a test yields inconsistent scores, it may be unethical to take any substantive actions on the basis of the test. There are several methods for computing test reliability including test-
retest
reliability,
parallel
forms
reliability,
decision
consistency, internal consistency, and interpreter reliability. For many criterion-referenced tests decision consistency is often an appropriate choice.
Types of Reliability Test-Retest Reliability To estimate test-retest reliability, you must administer a test form to a single group of examinees on two separate occasions. Typically, the two separate administrations are only a few days or a few weeks apart; the time should be short enough so that the examinees' skills in the area being assessed have not changed through additional learning. The relationship between the examinees' scores from the two different administrations is estimated, through statistical correlation, to determine how similar the scores are. This type of reliability demonstrates the extent to which a test is able to produce stable, consistent scores across time.
Parallel Forms Reliability Many exam programs develop multiple, parallel forms of an exam to help provide test security. These parallel forms are all constructed to match the test blueprint, and the parallel test forms are constructed to be similar in average item difficulty.
Parallel
forms
reliability
is
estimated
by
administering both forms of the exam to the same group of examinees.
While
the
time
between
the
two
test
administrations should be short, it does need to be long
enough so that examinees' scores are not affected by fatigue. The examinees' scores on the two test forms are correlated in order to determine how similarly the two test forms function. This reliability estimate is a measure of how consistent examinees’ scores can be expected to be across test forms.
Decision Consistency In
the
descriptions
of
test-retest
and
parallel
forms
reliability given above, the consistency or dependability of the
test
scores
was
emphasized.
For
many
criterion
referenced tests (CRTs) a more useful way to think about reliability may be in terms of examinees’ classifications. For example, a typical CRT will result in an examinee being classified as either a master or non-master; the examinee will either pass or fail the test. It is the reliability of this classification
decision
that
is
estimated
in
decision
consistency reliability. If an examinee is classified as a master on both test administrations, or as a non-master on both occasions, the test is producing consistent decisions. This approach can be used either with parallel forms or with a single form administered twice in test-retest fashion.
Internal Consistency The internal consistency measure of reliability is frequently used for norm referenced tests (NRTs). This method has the advantage of being able to be conducted using a single form
given at a single administration. The internal consistency method estimates how well the set of items on a test correlate with one another; that is, how similar the items on a test form are to one another. Many test analysis software programs produce this reliability estimate automatically. However, two common differences between NRTs and CRTs make this method of reliability estimation less useful for CRTs. First, because CRTs are typically designed to have a much narrower range of item difficulty, and examinee scores, the value of the reliability estimate will tend to be lower. Additionally, CRTs are often designed to measure a broader range of content; this results in a set of items that are not necessarily closely related to each other. This aspect of CRT test design will also produce a lower reliability estimate than would be seen on a typical NRT.
Interrater Reliability All of the methods for estimating reliability discussed thus far are intended to be used for objective tests. When a test includes performance tasks, or other items that need to be scored by human raters, then the reliability of those raters must
be
estimated.
This
reliability
method
asks
the
question, "If multiple raters scored a single examinee's performance, would the examinee receive the same score. Interrater reliability provides a measure of the dependability or consistency of scores that might be expected across raters.
Summary Test reliability is the aspect of test quality concerned with whether or not a test produces consistent results. While there are several methods for estimating test reliability, for objective CRTs the most useful types are probably testretest reliability, parallel forms reliability, and decision consistency. A type of reliability that is more useful for NRTs is internal consistency. For performance-based tests, and other tests that use human raters, interrater reliability is likely to be the most appropriate method.
Bibliography www.proftesting.com www.google.com