Assignment Task:
Studies will often use raters and observers in research as it is an effective way to code behaviors, track changes, and assess qualitative data. In theory, raters and observers should be well-trained with well-defined parameters which would lead to few disagreements between raters but that are not always the case. Using raters has its own set of challenges as people can bring in their own biases into a study which impacts the results. Measures of interrater reliability help to assess the raters, note the degree of consistency across the raters, and can help give confidence in the results. Bordens and Abbott (2022) note that interrater reliability helps make procedures that can be reproduced, it allows studies to meet the standard of reliability established, and it can help find any problems.
One scenario in which observers are used in research is in behavioral observation studies. For example, Jeong et al. (2025) created an observational tool that viewed mother-child interactions during a five-minute span where the family participated in a shared picture book activity. The researchers created a list of nineteen items that they would track between both mother and child. Another scenario in which raters are used is in diagnoses and clinical interviews. Becker et al. (2024) recorded interviews with both caregivers and children (using different scales) to evaluate children for cognitive disengagement syndrome. These interviews were recorded and were given to raters to review and give a diagnosis. Raters can also be used to assess qualitative data such as comparing job candidates against one another. This can be done by rating candidates based on specified criteria.
Bordens and Abbott (2022) note that Cohen's Kappa is a popular method of assessing interrater reliability as it provides a measure of agreement between two observers on a nominal scale. This method uses the actual agreement between observers and the proportion that can be expected by chance. A Cohen's Kappa of .7 or higher indicates good interrater reliability. The authors note that Pearson's product-moment correlation (Pearson r) is a good measure of interrater reliability as it can easily determine the statistical significance unlike Cohen's Kappa. Hilton et al. (2024) noted that intraclass correlations and percent agreement are effective ways to measure interrater reliability. ICC is used for observations that are scaled on an interval or ratio scale of measurement. It uses an analysis of variance approach to reliability. Need Assignment Help?
References:
Becker, S. P., Dunn, N. C., Fredrick, J. W., McBurnett, K., Tamm, L., & Burns, G. L. (2024). Cognitive Disengagement Syndrome-Clinical Interview (CDS-CI): Psychometric support for caregiver and youth versions. Psychological Assessment, 36(10), 618-630.
Bordens, K. L., & Abbott, B. B. (2022). Research Design and Methods: A Process Approach. McGraw Hill.
Hilton, N. Z., Hanson, R. K., Campbell, M. A., & Jung, S. (2024). Police and researcher use of the Ontario Domestic Assault Risk Assessment (ODARA): Interrater agreement and examination of published norms. Journal of Threat Assessment and Management.
Jeong, J., Mapendo, F., Hentschel, E., McCann, J. K., & Yousafzai, A. K. (2025). Validation of an observational tool for assessing mother-child and father-child interactions in Mara, Tanzania. Developmental Psychology.
Kline, T. J. B. (2005). Psychological testing: a practical approach to design and evaluation. Sage Publications.