Validity and Reliability (2)
Validity and Reliability (2)
Written by Ari Julianto
I. Reliability
Reliability is defined as the extent to which a questionnaire, test, observation or any measurement procedure produces the same results on repeated trials. In short, it is the stability or consistency of scores over time or across raters. Keep in mind that reliability pertains to scores not people.
Thus, in research we would never say that someone was reliable. As an example, consider judges in a platform diving competition. The extent to which they agree on the scores for each contestant is an indication of reliability. Similarly, the degree to which an individual’s responses (i.e., their scores) on a survey would stay the same over time is also a sign of reliability.
An important point to understand is that a measure can be perfectly reliable and yet not be valid. A research example of this phenomenon would be a questionnaire designed to assess job satisfaction that asked questions such as, “Do you like to watch ice hockey games?”, “What do you like to eat more, pizza or hamburgers?” and “What is your favorite movie?”. As you can readily imagine, the responses to these questions would probably remain stable over time, thus,demonstrating highly reliable scores. However, are the questions valid when one is attempting to measure job satisfaction? Of course not, as they have nothing to do with an individual’s level of job
satisfaction.
There are three aspects of reliability, namely: equivalence, stability and internal consistency (homogeneity). It is important to understand the distinction between these three as it will guide one in the proper assessment of reliability given the research protocol.
a. Equivalence, refers to the amount of agreement between two or more instruments that are administered at nearly the same point in time. Equivalence is measured through a parallel forms procedure in which one administers alternative forms of the same measure to either the same group or different group of respondents.
b. Stability, is said to occur when the same or similar scores are obtained with repeated testing with the same group of respondents. In other words, the scores are consistent from one time to the next. Stability is assessed through a test-retest procedure that involves administering the same measurement instrument to the same individuals under the same conditions after some period of time. Test-rest reliability is estimated with correlations between the scores at Time 1 and those at Time 2 (to Time x).
c. Internal consistency concerns the extent to which items on the test or instrument are measuring the same thing. If, for example, you are developing a test to measure organizational commitment you should determine the reliability of each item. If the individual items are highly correlated with each other you can be highly confident in the reliability of the entire scale. The appeal of an internal consistency index of reliability is that it is estimated after only one test administration and therefore avoids the problems associated with testing over multiple time periods.
Internal consistency is estimated via the split-half reliability index,coefficient alpha (Cronbach, 1951) index or the Kuder-Richardson formula 20 (KR-20) (Kuder & Richardson, 1937) index. The split-half estimate entails dividing up the test into two parts (e.g., odd/even items or first half of the items/second half of the items), administering the two forms to the same group of individuals and correlating the responses. Coefficient alpha and KR-20 both represent the average of all possible split-half estimates.
REFERENCES
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16,
297-334.
Gulliksen, H. (1950). Theory of mental tests. New York:Wiley.
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151-160.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill