Abstract

At the level of manifest categorical variables, a large number of coefficients and models for the examination of rater agreement has been proposed and used. The most popular of these is Cohen's κ. In this article, a new coefficient, κ _s , is proposed as an alternative measure of rater agreement. Both κ and κ _s allow researchers to determine whether agreement in groups of two or more raters is significantly beyond chance. Stouffer's z is used to test the null hypothesis that κ _s = 0. The coefficient κ _s allows one, in addition to evaluating rater agreement in a fashion parallel to κ, to (1) examine subsets of cells in agreement tables, (2) examine cells that indicate disagreement, (3) consider alternative chance models, (4) take covariates into account, and (5) compare independent samples. Results from a simulation study are reported, which suggest that (a) the four measures of rater agreement, Cohen's κ, Brennan and Prediger's κ _n , raw agreement, and κ _s are sensitive to the same data characteristics when evaluating rater agreement and (b) both the z-statistic for Cohen's κ and Stouffer's z for κ _s are unimodally and symmetrically distributed, but slightly heavy-tailed. Examples use data from verbal processing and applicant selection.

References

Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: Wiley First citation in article Google Scholar
Banerjee, M. , Capozzoli, M. , McSweeney, L. , Sinha, D. (1999). Beyond κ: A review of rater agreement measures. The Canadian Journal of Statistics, 27, 3– 23 First citation in article Crossref, Google Scholar
Barlow, W. (1996). Measurement of iterrater agreement with adjustment for covariates. Biometrics, 52, 695– 702 First citation in article Crossref, Google Scholar
Barnhart, H.X. , Williamson, J.M. (2002). Weighted least-squares approach for comparing correlated κ. Biometrics, 58, 1012– 1019 First citation in article Crossref, Google Scholar
Brennan, R.L. , Prediger, D.J. (1981). Coefficient κ: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687– 699 First citation in article Crossref, Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37– 46 First citation in article Crossref, Google Scholar
Darlington, R.B. , Hayes, A.F. (2000). Combining independent p values: Extensions of the Stouffer and binomial methods. Psychological Methods, 5, 496– 515 First citation in article Crossref, Google Scholar
Donner, A. , Klar, N. (1996). The statistical analysis of κ statistics in multiple samples. Journal of Clinical Epidemiology, 49, 1053– 1058 First citation in article Crossref, Google Scholar
Donner, A. , Zhou, G. (2002). Interval estimation for a difference between intraclass κ statistics. Biometrics, 58, 209– 215 First citation in article Crossref, Google Scholar
Feinstein, A.R. , Cicchetti, D.V. (1990). High agreement but low κ I: The problems of two paradoxes. Journal of Clinical Epidemiology, 43, 543– 549 First citation in article Crossref, Google Scholar
Fleiss, J.L. (1975). Measuring agreement between two judges in the presence or absence of a trait. Biometrics, 31, 651– 659 First citation in article Crossref, Google Scholar
Fleiss, J.L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: Wiley First citation in article Google Scholar
Fleiss, J.L. , Cohen, J. , Everitt, B.S. (1969). Large sample standard errors of κ and weighted κ. Psychological Bulletin, 72, 323– 327 First citation in article Crossref, Google Scholar
Fleiss, J.L. , Levin, B. , Paik, M.C. (2003). Statistical methods for rates and proportions (3rd ed.). New York: Wiley First citation in article Google Scholar
Goodman, L.A. (1965). On simultaneous confidence intervals for multinomial proportions. Technometrics, 7, 247– 254 First citation in article Crossref, Google Scholar
Goodman, L.A. (1979). Simple models for the analysis of association in cross-classifications having ordered categories. Journal of the American Statistical Association, 74, 537– 552 First citation in article Crossref, Google Scholar
Goodman, L.A. , Kruskal, W.H. (1954). Measures of association for cross-classifications. Journal of the American Statistical Association, 49, 732– 764 First citation in article Google Scholar
Guggenmoos-Holzmann, I. (1995). Modeling covariate effects in observer agreement studies: The case of nominal scale agreement (letter to the editor). Statistics in Medicine, 14, 2285– 2286 First citation in article Crossref, Google Scholar
Hildebrand, D.K. , Laing, J.D. , Rosenthal, H. (1977). Prediction analysis of cross-classifications . New York: Wiley First citation in article Google Scholar
Keselman, H.J. , Cribbie, R. , Holland, B. (1999). The pairwise multiple comparison multiplicity problem: An alternative approach to familywise and comparisonwise Type I error control. Psychological Methods, 4, 58– 69 First citation in article Crossref, Google Scholar
Klar, N. , Lipsitz, S.R. , Ibrahim, J. (2000). An estimating equation for modeling κ. Biometrical Journal, 42, 45– 58 First citation in article Crossref, Google Scholar
Landis, J.R. , Koch, G.G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159– 174 First citation in article Crossref, Google Scholar
Microsoft Corporation. (1995). Microsoft(R) Fortran PowerStation . Version 4.0 First citation in article Google Scholar
Park, S.K. , Miller, K.W. (1988). Random number generators: Good ones are hard to find. Communications of the Association for Computing Machinery, 31, 1192– 1201 First citation in article Crossref, Google Scholar
Press, W.H. , Flannery, B.P. , Teukolsky, S.A. , Vetterling, W.T. (1989). Numerical recipes. The art of scientific computing (FORTRAN version) . Cambridge: Cambridge University Press First citation in article Google Scholar
Schuster, C. , Smith, D.A. (2002). Indexing systematic rater agreement with a latent class model. Psychological Methods, 7, 384– 395 First citation in article Crossref, Google Scholar
Schuster, C. , von Eye, A. (2001). Models for ordinal agreement data. Biometrical Journal, 43, 795– 808 First citation in article Crossref, Google Scholar
Stouffer, S.A. , Suchman, E.A. , DeVinney, L.C. , Star, S.A. , Williams, R.M. Jr. (1949). The American soldier: Adjustment during Army life (vol. 1). Princeton, NJ: Princeton University Press First citation in article Google Scholar
Tanner, M.A. , Young, M.A. (1985). Modeling agreement among raters. Journal of the American Statistical Association, 80, 175– 180 First citation in article Crossref, Google Scholar
Uebersax, J.S. (1993). Statistical modeling of expert ratings on medical treatment appropriateness. Journal of the American Statistical Association, 88, 421– 427 First citation in article Crossref, Google Scholar
von Eye, A. (2002). Configural Frequency Analysis - Methods, models, applications . Mahwah, NJ: Erlbaum First citation in article Google Scholar
von Eye, A. , Brandtstädter, J. (1988). Application of prediction analysis to cross-classifications of ordinal data. Biometrical Journal, 30, 651– 655 First citation in article Crossref, Google Scholar
von Eye, A. , Jacobson, L.P. , Wills, S.D. (1990, July). Proverbs: Imagery, interpretation, and memory . 12th West Virginia University Conference on Life-Span Developmental Psychology, Morgantown, WV First citation in article Google Scholar
von Eye, A. , Mun, E.Y. (2005). Analyzing rater agreement - Manifest variable models . Mahwah, NJ: Erlbaum First citation in article Google Scholar
von Eye, A. , Schuster, C. (2000). Log-linear models for rater agreement. Multiciência, 4, 38– 56 First citation in article Google Scholar
von Eye, A. , Sörensen, S. (1991). Models of chance when measuring interrater agreement with κ. Biometrical Journal, 33, 781– 787 First citation in article Crossref, Google Scholar
Wickens, T. (1989). Multiway contingency tables analysis for the social sciences . Hillsdale, NJ: Erlbaum First citation in article Google Scholar

Volume 11Issue 1March 2006

ISSN: 1016-9040eISSN: 1878-531X

Licenses & Copyright

Keywords

Acknowledgments:

The author is indebted to Richard P. DeShon and Neal Schmitt for helpful comments on earlier versions of this article.

PDF download

Verify Phone

Congrats!

An Alternative to Cohen's κ

Abstract

References

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners

Change Password

Your password must have 8 characters or more and contain 3 of the following:

Password Changed Successfully

Create a new account

Request Username

Verify Phone

Congrats!

An Alternative to Cohen's κ

Abstract

References

Licenses & Copyright

Acknowledgments:

Support & Contact

Support & Contact

Legal information

Legal information

More offers

More offers

Our partners

Our partners