Please support Web standards. Have a look at this page on the Web Standards Project site.
Kaiser, R.B., & Craig, S.B. (2006). Bad items, bad data: Item characteristics and rating discrepancies in multi-source assessments. In S. Reddy (Ed.), Perspectives on Multirater Performance Assessment. (Ch. 5, pp. 76-91). Nagarjuna Hills, Hyderabad, India: ICFAI University Press.
The authors hypothesized that the low level of convergence characteristic of multi-source performance ratings is a function of the quality of the items used in these instruments. Considering rating items as “stimuli” to raters, three linguistic characteristics are identified that can cause raters to attach different meanings to the same item. In turn, these different interpretations can lead to discrepant ratings from multiple judges of the same target. The results of this study indicate that inter-rater reliability is lower for items that describe complex abstractions of behaviors and for items that refer to more than one discrete behavior. By contrast, inter-rater reliability is higher for items that describe less descriptively complex dispositional qualities and those that are grammatically complex enough to provide contextual cues yet focus on only one specific behavior. Thus, the authors of multi-rater instruments can enhance the quality of feedback provided with their tools by paying more attention to how the items are written.
Article in copyrighted book: Download manuscript.
Filed under: Books & Book Chapters