Questions and Answers of Measurement Theory In Action

3. What are the advantages and disadvantages of constructed-response items?
4. What are the advantages and disadvantages of free-response items?
8. Why is pretesting of items important in test construction?
9. In what ways does the revised Bloom’s taxonomy differ from the original?
10. Who would be appropriate to fulfill the role of SME for a test designed to assess knowledge of the following?a. Twelfth-grade mathematicsb. Modern automotive repairc. American pop culture
1. Jaime followed a number of recommended steps for test development.For each of the following, explain how it assists in the development of a quality test of maximal performance:a. Consideration of
2. Jaime also followed a number of recommended steps to score this essay test. For each of the following, explain how it helps improve reliability:a. Recording of student identification numbers
3. Jaime unwittingly included writing ability in his scoring. Is writing ability an appropriate test component for a university class in forensic psychology? Explain.
1. Do certain item formats prepare students for the “real world” better than others? Why or why not?
2. What are likely some of the arguments put forth by those who reject constructed-response testing in universities? To what degree do you feel these arguments are valid?
3. In a university setting, why might some departments likely champion free-response item formats while other departments prefer constructedresponse formats?
4. What role does politics play in the choice of adoption of item formats in a college classroom?
5. What role should practical concerns (such as class size) play in the determination of item formats?
6. Can constructed-response items assess higher-level cognitive objectives?
7. What item formats do you prefer to be tested with? Why?
8. Based on your own personal experience and observations, in what ways does the choice of item format influence student test preparation?
EXERCISE 12.1: DETERMINATION OF TEST COMPOSITION OBJECTIVE: To gain practice developing test specifications for knowledge tests.In developing knowledge tests, many important decisions must be made to
EXERCISE 12.2: WRITING ITEMS TO ASSESS KNOWLEDGE OBJECTIVE: To develop high-quality items to assess knowledge of a specific domain.Students sometimes complain that items on a test are vague,
1. Be sure to interpret item difficulty, item discrimination, and overall test statistics based on the context of the testing situation (i.e., norm referenced vs.criterion referenced).
2. Increasingly, best practices in educational environments include “real-time”formative assessment, such as the use of student response pads (i.e., clickers)in order to improve both instruction
3. Every attempt should be made to revise an item before you summarily delete it because of poor item analysis statistics.
4. Polytomous scoring of test items (i.e., weighting each response rather than simply dichotomizing responses as 1 = correct and 0 = incorrect) provides much more information from each item and thus
7. Oftentimes in a classroom environment, you might have more students (respondents)than you have items. Does this pose a problem for interpreting your item analysis statistics?
8. What corrections, if any, might you make to items 1, 2, 4, 5, and 8 based on the information provided in Table 13.2 ?
1. What might explain the pattern of results that Andrew observed for the different job classifications?
2. Given the differing results by job classification, should the same test still be used for all the job classifications? What key issues should Andrew consider?
3. What might be unique about the unskilled and semiskilled job candidates in the Midwest as compared to their counterparts in the West, South, and East?
4. What do you think would have happened if Andrew had not separated the data by job classification and region?
5. Andrew focused primarily on the difficulty index. What other itemlevel statistics should he compute? What unique information would they provide?
1. How did students seem to do based on the five items presented in Table 13.3 ?
2. Based only on the information presented in Table 13.3 , what revisions should Linda make to each item?
3. Do you have a concern that Linda had 100 items but only 21 subjects?What problems might this cause in interpreting her item analysis results?
4. Why do you think .000s are printed for all the entries in item 5, as well as for some options in the other items?
5. Which item would you say is the “best” item? Why?
6. Are there any items that Linda should simply throw out (i.e., they are just not worth spending the time revising)?
7. What additional information would be helpful in evaluating the test items?
8. Is there a problem with using graduate students during the pilot-testing phase if the test will eventually be used as an outcomes assessment device for undergraduates?
EXERCISE 13.1: ITEM ANALYSIS OF AN ORGANIZATIONAL BEHAVIOR TEST OBJECTIVE: To practice evaluating items using item analysis statistics.Selected items (13 to be exact) from a 50-item multiple-choice
1. It is best to use some combination of informed, expert judgment as well as empirical data to make decisions regarding setting cutoff scores and pass points.
2. When using expert judgments, it is best to provide feedback to the SMEs regarding the other raters’ rating (e.g., a Delphi technique).
3. Be sure to seriously consider not only psychometric criteria and expert judgment in setting cutoff scores and pass points, but also practical considerations such as the size of the applicant pool
1. How do we best define the MCP when using judgmental methods such as the Angoff, Nedelsky, and Ebel methods?
7. What if we set a cutoff score and no one passes?
1. If you were Alexius, where would you start your search for “best practices” for setting cutoff scores on a graduate comprehensive examination?
2. While a purely empirical method for setting the cutoff scores seems unrealistic given the small sample sizes, what things could Alexius do to make his judgmental procedures more empirical?
3. Who are the likely SMEs for setting the cutoff scores for the general test? The area-specific tests?
4. Is there a problem with the same individuals writing the questions and also helping to set the cutoff scores on the test they created?
5. Would information from past “pass rates” be of any use to the committee given that the format is being changed?
1. Should Jasmin recommend the same cutoff score for each university, or should different cutoff scores be recommended?
2. Instead of having one overall cutoff score, might it be better to have separate cutoff scores for different portions of the exam?
3. Should Jasmin take into consideration the other criteria used by each of the universities to select its students? If so, how?
4. Given the large sample of data Jasmin will have to work with, how might she incorporate some empirical data into the cutoff score decision?
5. Who should be the SMEs for Jasmin in helping her to set the cutoff score(s)?
EXERCISE 14.1: JUDGMENTAL PROCEDURES FOR SETTING CUTOFF SCORES OBJECTIVE: To practice setting cutoff scores using the Angoff and Nedelsky methods.SCENARIO: The psychology department has decided to
EXERCISE 14.2: DELPHI METHOD FOR SETTING CUTOFF SCORES OBJECTIVE: To practice using a judgmental/empirical method for setting cutoff scores.This exercise requires that you first complete the steps in
1. Did you find the summary ratings helpful to you as you reviewed your initial set of ratings?
2. If you changed some of your ratings, why did you make changes?
3. Did you notice any patterns in your ratings compared to the summary ratings? For example, did you tend to be more stringent or lenient in your ratings than the other raters?
EXERCISE 14.3: CONTRASTING GROUPS METHOD FOR SETTING CUTOFF SCORES Objective : To practice using empirical procedures to make passing score decisions.The data set “Passing Score.sav” (see
1. Based on your line graph, where would you set the passing score?
2. Can a case be made for more than one passing score (similar to the step-by-step example)?
1. Development of a measure of typical performance begins once again with thorough test specifications.
2. Recognize that attitude and survey measurement is part of a social exchange.Respondents’ decisions to participate in such measurement, and their interpretation of the items presented to them,
3. Consult expert recommendations on the design and implementation of measures of typical performance. At the same time, use your own judgment of which rules, admonitions, and tips apply to your
4. Always pilot test a measure prior to full implementation.
3. In assessing someone’s opinion, when might you prefer to use a freeresponse item format? When might you prefer to use a constructed-response item format?
1. Why did Hathaway and McKinley begin with such a large pool of potential items?
2. In what ways would item selection have differed if the original MMPI had been rationally developed?
3. Discuss the degree to which you feel the choice of criterion was appropriate for the MMPI.
4. Why did Hathaway and McKinley cross-validate the clinical scales?
5. Why would the process used to develop the MMPI be advantageous for diagnosing clinical patients?
6. Why would the revision of the MMPI include a somewhat more theoretical approach to test development?
7. The MMPI has sometimes been used in the selection of new employees.Is this an appropriate use of the test? Why or why not?
1. What special challenges might there be for defining a newly conceptualized construct such as joinership?
2. How would measurement of a personality trait differ from self-reported behavior? What implications would this have for the development of the scale?
3. A thorough test specification would likely discuss constructs that were similar, but distinct, from the construct assessed by the measure. What constructs might be used to compare and contrast
4. How important is it to define the context in which the scale is to be used? For what purposes could the joinership scale be used?
5. Explain how theory provided assistance in the development of the joinership scale. What role should theory play in the development of a psychological measure?
6. Now that the group has decided on the dimensionality of the construct, how should item writing proceed?
EXERCISE 15.1: IMPROVING SURVEY ITEMS OBJECTIVE: To identify and correct poorly written survey items.Each of the items below share the following five-point Likert-type response scale:For each item,
EXERCISE 15.2: EXAMINING SME RATINGS OF THE JOINERSHIP ITEMS OBJECTIVE: To refine the draft joinership scale using information provided by SME ratings.Use the data file “Joinership Rational
EXERCISE 15.3: FACTOR ANALYZING THE JOINERSHIP ITEMS OBJECTIVE: To examine the dimensionality of the remaining joinership scale items. ( Note: If you have not yet covered Module 18, your instructor
EXERCISE 15.4: EXAMINING THE RELIABILITY OF THE JOINERSHIP SUBSCALES OBJECTIVE: To develop subscales of joinership with high internal-consistency reliability.
1. Using only those items retained following Exercises 15.2 and 15.3, conduct a reliability analysis of each dimension of the scale (as represented by the factors that emerged in Exercise 15.3) (
2. Examine the output for each reliability analysis. Compare the obtained alpha with the alpha estimated if each particular item was deleted. Would the alpha increase if an item were deleted from the
3. Once the alphas of each dimension of the scale have been determined, compute the alpha of the overall scale.
1. Any use of correction-for-guessing procedures should be well justified and carefully implemented.
2. It is always best to find ways to prevent response biases, rather than simply waiting until the data is collected to identify and deal with them post hoc.
3. It is important to distinguish between response biases (context dependent)and response styles (universal traits) and to recognize the need to deal with them in different ways.
1. What alternative explanation (besides cheating) do you think might explain the low variability in the afternoon section?
2. What might Ryan have done differently to reduce the possible “cheating factor”?
3. Would using a correction-for-guessing formula help Ryan in any way?If so, how?
4. Are there other statistical corrections Ryan could institute to correct for the low variability?
5. Is the low variability in test scores really a problem in a classroom situation such as this?
1. If you were Dora, would you use the surveys from her friend?
2. Would the data still be useful to Dora, assuming working parents, in fact, completed the data from her friend?
3. Again, assuming that the data are, in fact, legitimate, what response bias seems to be happening here?
4. Are there any statistical corrections that can be made to the data to make them useful?
5. If Dora were a fellow student colleague and friend, what suggestions would you provide to her with regard to collecting more data?
EXERCISE 16.1: CORRECTION FOR GUESSING IN MULTIPLE-CHOICE KNOWLEDGE TESTS—COMPUTER EXERCISE OBJECTIVE: To practice using the correction-for-guessing formula discussed in the module overview with

Showing 700 - 800 of 1226