Written by Thomas M, Haladyna in in Developing and Validating Multiple-Choice Test Items. New Jersey: Lawrence Erlbaum Associates, Inc.2004. pp 67-92
A test item is the basic unit of observation in any test. A test item usually contains a statement that elicits a test taker response. That response is scorable, usually 1 for a correct response and 0 for an incorrect response, or the response might be placed on a rating scale from low to high.
A test is a measuring device intended to describe numerically the degree or amount of learning under uniform, standardized conditions. In educational testing, most tests contain a single item or set of test items intended to measure a domain of knowledge or skills or a cognitive ability. In the instance of the latter, a single test item might be a writing prompt or a complex mathematics problem that may be scored by one or more judges using one or more traits and associated rating scales. Responses to a single test item or a collection of test items are scorable. The use of scoring rules helps create a test score that is based on the test taker's responses to these test items.
I. Conventional Multiple-Choice
The most common Multiple-Choice (MC) format is conventional. We have three variations. Each variation has three parts:
1. a stem
The stem is the stimulus for the response. The stem should provide a complete idea of the knowledge to be indicated in selecting the right answer. The first item in Example 1.1 shows the question format. The second item shows the incomplete stem (a partial sentence) format. The third item shows the best answer format.
2. the correct choice
The correct choice is undeniably the one and only right answer. In the question format, the correct choice can be a word, phrase, or sentence. In some rare circumstances, it can be a paragraph or even a drawing or photograph (if the distractors are also paragraphs, drawings, or photographs). However, the use of paragraphs, drawings, photographs, and the like make the administration of the item inefficient. With the incomplete stem, the second part of the sentence is the option, and one of these is the right answer. With the best-answer format, all the options are correct, but only one is unarguably the best.
3. several wrong answers, called foils, misleads, or distractors
Distractors are the most difficult part of the test item to write. A distractor is an unquestionably wrong answer. Each distractor must be plausible to test takers who have not yet learned the knowledge or skill that the test item is supposed to measure. To those who possess the knowledge asked for in the item, the distractors are clearly wrong choices. Each distractor should resemble the correct choice in grammatical form, style, and length. Subtle or blatant clues that give away the correct choice should always be avoided. The number of distractors required for the conventional MC item is a matter of some controversy (Haladyna & Downing, 1993). When analyzing a variety of tests, Haladyna and Downing (1993) found that most items had only one or two "working" distractors. They concluded that three options (a right answer and two distractors) was natural. Few items had three working distractors.
Example 1.1 Question Format
Who is John Gait?....................stem
A. A rock star....................... foil or distractor
B. A movie actor......................foil or distractor
C. A character in a book.............. correct choice
Incomplete Stem (Partial Sentence)
John Gait is a character in an Ayn Rand novel who is remembered
for his
A. integrity.
B. romantic tendencies.
C. courage.
Best Answer
Which is the most effective safety feature in your car?
A. Seat belt
B. Front air bag
C. Anti-lock braking system
II. Matching
A popular variation of the conventional MC is the matching format. We use this format when we have a set of options that seems useful for two or more items. The matching format begins with a set of options at the top followed by a set of stems below. The instructions that precede the options and stems tell the test taker how to respond and where to mark answers. As shown in Example 1.2, we have five options and six statements. We could easily expand the list of six statements into a longer list, which makes the set of items more comprehensive in testing student learning. In a survey of current measurement textbooks, Haladyna et al. (2002) discovered that every measurement textbook they surveyed recommended the matching format. It is interesting that there is no cited research on this format in any of these textbooks or prior reviews of research on item formats.
Example 1.2 Mark your answer on the answer sheet.
For each item select the correct answer from options provided below.
A. Minnesota
B. Illinois
C. Wisconsin
D. Nebraska
E. Iowa
1. Home state of the Hawkeyes
2. Known for its cheese heads
3. Land of many lakes
4. Cornhuskers country
5. The largest of these states
6. Contains Cook County
The matching format has many advantages:
1. Matching items are easy to construct.
2. The presentation of items is compact. The example just provided could be expanded to produce as many as 30 items on a single page.
3. This format is popular and widely accepted.
4. Matching lends itself nicely to testing understanding of concepts, principles, and procedures.
5. Matching is efficient based on the amount of student testing time needed to answer a set of matching test items.
6. The options do not have to be repeated. If we reformatted this into conventional MC, it would require the repeating of the five options for each stem.
III. Extended Matching
An extended-matching (EM) format is an MC variation that uses a long list of options linked to a long list of item stems. According to Case and Swanson (1993), a set of EM items has four components:
1. a theme,
2. a set of options,
3. a lead-in statement, and
4. a set of stems.
The theme focuses the test taker in a context. Example 1.3 shows a generic EM format. The options are possible right answers. This list of options can be lengthy. In fact, the list of options might be exhaustive of the domain of possible right answers. The list of options for an EM item set must also be homogeneous in content.
Example 1.3 A generic extended-matching item
Theme: Neuropsychological tests
Options
A. Cognitive Estimates Test
B. Digit Span
C. Go-No Go Test
D. Mini Mental State Examination
E. National Adult Reading Test
F. Raven's Progressive Matrices
G. Rivermead Behavioural Memory Test
H. Stroop Test
I. Wechsler Memory Scale
J. Wisconsin Card Sorting Test
Lead-in: A 54-year-old man has a year's history of steadily progressive personality changes. He has become increasingly apathetic and appears depressed. His main complaint is increasing frontal headaches. On examination, he has word finding difficulties. EEG shows frontal slowing that is greater on the left.
Which test should you consider?
Stems:
1. You are concerned that he may have an intracranial space-occupying lesion.
2. Test indicates that his current performance IQ is in the low average range.
3. The estimate of his premorbid IQ is 15 points higher than his current performance IQ. It is recommended that he has a full WAIS IQ assessment to measure both performance and verbal IQ. On the WAIS, his verbal IQ is found to be impaired over and above his performance IQ.
Which test is part of the WAIS verbal subtests?
4. An MRI can shows a large meningioma compressing dorsolateral prefrontal cortex on the left. Which test result is most
likely to be impaired?
The EM format is highly recommended for many good reasons.
1. Items are easy to write.
2. Items can be administered quickly.
3. The cognitive process may be understanding and, in some instances, application of knowledge that we associate with problem solving.
4. These items seem less resilient to cuing, whereas with conventional MC one item can cue another.
5. EM items are more resilient to guessing. Moreover, Haladyna and Downing (1993) showed that conventional MC items seldom have many good distractors; thus, guessing a right answer is more likely with conventional MC.
IV. Alternate Choice
Alternate Choice (AC) is a conventional MC with only two options. Ebel (1981, 1982), a staunch advocate of this format, argued that many items in achievement testing are either-or, lending them nicely to the AC format. Downing (1992) reviewed the research on this format and agreed that AC is viable. Haladyna and Downing (1993) examined more than 1,100 items from four standardized tests and found many items have a correct answer and only one working distractor. The other distractors were nonfunctioning. They concluded that many of these items were naturally in the AC format. Example 1.4 shows a simple AC item
Example 1.4 Alternative-choice items measuring writing skills.
1. (A-Providing, B-Provided) that all homework is done, you
may go to the movie.
2. It wasn't very long (A-before, B-until) Earl called Keisa.
3. Knowledge of (A-preventative, B-preventive) medicine will
lengthen your life.
4. All instructions should be written, not (A-oral, B-verbal).
5. Mom divided the pizza (A-between, B-among) her three boys.
The AC has several attractive characteristics and some limitations:
1. The most obvious advantage is that writing the AC item is easy to write.The item writer only has to think of a right answer and one plausible distractor.
2. The efficiency of the use of this format with respect to printing costs,ease of test construction, and test administration is high.
3. Another advantage is that if the item has only two options, one can assign more AC items to a test per testing period than with conventional MC items. Consequently, the AC format provides better coverage of the content domain.
4. AC items are not limited to recall but can be used to measure understanding,some cognitive skills, and even some aspects of abilities (Ebel,1982).
V. True-False
The True-Falese (TF) format has been well established for classroom assessment but seldom used in standardized testing programs.Like other two-option formats, TF is subject to many abuses. The most common may be a tendency to test recall of trivial knowledge. Example 1.5 shows the use of TF for basic knowledge.
Example 1.5 Examples of true-false items.
Mark A on your answer sheet if true and B if false.
1. The first thing to do with an automatic transmission that does not work is to check the transmission fluid. (A)
2. The major cause of tire wear is poor wheel balance. (B)
3. The usual cause of clutch "chatter" is in the clutch pedal linkage. (A)
4. The distributor rotates at one half the speed of the engine crankshaft. (B)
The advantages of TF items are as follows:
1. TF items are easy to write.
2. TF items can measure important content.
3. TF items can measure different cognitive processes.
4. More TF items can be given per testing period than conventional MC items.
5. TF items are easy to score.
6. TF items occupy less space on the page than other MC formats, therefore minimizing the cost of production.
7. The judgment of a proposition as true or false is realistic.
8. We can reduce reading time.
9. Reliability of test scores is adequate.
VI. Complex Multiple-Choice
This item format offers test takers three choices regrouped into four options, as shown in Example 1.6 The Educational Testing Service first introduced this format, and the National Board of Medical Examiners later adopted it for use in medical testing (Hubbard, 1978). Because many items used in medical and health professions testing programs had more than one right answer, complex MC permits the use of one or more correct options in a single item. Because each item is scored either right or wrong, it seems sensible to set out combinations of right and wrong answers in an MC format where only one choice is correct.
Example 1.6 Complex multiple-choice item.
Which actors appeared in the movie Lethal Weapon 10?
1. Mel Gibson
2. Dannie Glover
3. Vin Diesel
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1,2, and 3
Complex MC was popular in formal testing programs, but its popularity is justifiably waning. Albanese (1992), Haladyna (1992b), and Haladyna and Downing (1989b) gave several reasons
to recommend against its use:
1. Complex MC items may be more difficult than comparable single-best answer MC.
2. Having partial knowledge, knowing that one option is absolutely correct or incorrect, helps the test taker identify the correct option by eliminating distractors. Therefore, test-taking skills have a greater influence on test performance than intended.
3. This format produces items with lower discrimination, which in turn lowers test score reliability.
4. The format is difficult to construct and edit.
5. The format takes up more space on the page, which increases the page length of the test.
6. The format requires more reading time, thus reducing the number of items of this type one might put in a test. Such a reduction negatively affects the sampling of content, therefore reducing the validity of interpretations and uses of test scores.
VII. Multiple True-False
The Multiple True-False (MTF), which is sometimes referred to as Type X, has much in common with the TF format. The distinguishing characteristic between the two formats is that the TF items should be nonhomogeneous in content and cognitive demand, whereas the MTF has much in common and usually derives its commonality from a lead-in statement, such as with the EM format. Example 1.7 names a book read by the class, and five statements are offered that may be applicable to the book. Each student has to link the statement to the book plausibly. Some statement are true and others are false.
Example 1.7 Multiple True-False
The Lion, the Witch, and the Wardrobe by C. S. Lewis can best be summarized by saying:
1. A penny saved is a penny earned.
2. If you give them an inch, they will take a mile.
3. Good will always overcome evil.
4. Do not put off tomorrow what you can do today.
5. Do not put all your eggs in one basket
The advantages of the MTF format are as follows:
1. This format avoids the disadvantages of the complex MC format.
2. Recent research has shown that the MTF item format is effective as to reliability and validity,
3. This format is efficient in item development, examinee reading time, and the number of questions that can be asked in a fixed time.
Reference
Albanese, M. A. (1992). Type K items. Educational Measurement: Issues and Practices, 12,28-33.
Case, S. M, & Swanson, D. B. (1993). Extended matching items: A practical alternative
to free response questions. Teaching and Learning in Medicine, 5(2), 107-115.
Ebel, R. L. (1981, April). Some advantages of alternate-choice test items. Paper presented at the
annual meeting of the National Council on Measurement in Education, Los Angeles.
Ebel, R. L. (1982). Proposed solutions to two problems of test construction. Journal of Educational Measurement, 19,267-278
Downing, S. M. (1992). True-false and alternate-choice item formats: A review of research.Educational Measurement: Issues and Practices, I I , 27-30.
Haladyna, T. M., & Downing, S. M. (1993). How many options is enough for a multiple-
choice test item. Educational and Psychological Measurement, 53, 999-1010.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-
choice item-writing guidelines for classroom assessment. Applied Measurement in
Education, 15(3), 309-334.
Haladyna, T. M. (1992b). The effectiveness of several multiple-choice formats. Applied
Measurement in Education, 5, 73-88.
HubbardJ. E (197'8). Measuring medical education: The tests and experience of the National Board of Medical Examiners (2nd ed.). Philadelphia: Lea and Febiger.