Andrew, a third-year graduate student, was enrolled in a PhD program in quantitative psychology. He had recently
Question:
Andrew, a third-year graduate student, was enrolled in a PhD program in quantitative psychology. He had recently obtained a highly competitive summer internship with a Fortune 500 company in its employment testing section. As one of his first assignments, his new supervisor asked Andrew to review the item analysis statistics for a short 25-item timed test of general mental ability (GMA) that the company administers to thousands of job candidates every year. Test scoring is conducted and processed within four regional centers (East, South, West, and Midwest). Therefore, before combining all the regions, Andrew decided to first examine the item statistics within each region by each of five broad job classifications (i.e., administrative/professional, clerical, skilled craft, semiskilled, and unskilled/laborer).
After completing and reviewing the initial set of item analyses, Andrew noticed an interesting pattern. The first ten items had very good item analysis statistics for the clerical and semiskilled positions, but not very good statistics for the other job classifications. In particular, he noticed an extremely high percentage (more than 98%) of the administrative/professional candidates and 88% of the skilled craft candidates answered the first ten questions correctly, while very few (less than 10%) of the unskilled/laborer job candidates answered the first ten questions correctly. As a result, the item discrimination indexes for these job classes were near zero. For items 11–19, the item analysis statistics were still unfavorable for the administrative/professional candidates and unskilled/laborer candidates, but were much more favorable for the skilled craft candidates. Finally, for items 20–25 the item analysis statistics were favorable for the administrative/ professional candidates, but very few of the other candidates were even able to attempt these items. As a result, their p values were extremely low and their item discrimination indexes were near zero. To top it all off, this pattern seemed to hold for three of the four regions, but the midwestern region seemed to be getting very different results. In particular, the unskilled and semiskilled job candidates appeared to be doing significantly better on the early items than their counterparts in other regions of the country. Somewhat perplexed, it seemed time for Andrew to discuss things with his new supervisor.
Questions
1. What might explain the pattern of results that Andrew observed for the different job classifications?
2. Given the differing results by job classification, should the same test still be used for all the job classifications? What key issues should Andrew consider?
3. What might be unique about the unskilled and semiskilled job candidates in the Midwest as compared to their counterparts in the West, South, and East?
4. What do you think would have happened if Andrew had not separated the data by job classification and region?
5. Andrew focused primarily on the difficulty index. What other item-level statistics should he compute? What unique information would they provide?
Step by Step Answer:
Measurement Theory In Action
ISBN: 9780367192181
3rd Edition
Authors: Kenneth S Shultz, David Whitney, Michael J Zickar